LMCocktail-Mistral-7B-v1/scripts/merge.log

create a temporary directory: /tmp/tmpc0jeswnf
loading /media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/Mistral-7B-Instruct-v0.2 -----------------

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards:  33%|███▎      | 1/3 [00:00<00:01,  1.64it/s]
Loading checkpoint shards:  67%|██████▋   | 2/3 [00:01<00:00,  1.63it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  1.69it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  1.67it/s]
loading /media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/xDAN-L1-Chat-RL-v1 -----------------

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]
Loading checkpoint shards:   3%|▎         | 1/33 [00:00<00:03,  9.51it/s]
Loading checkpoint shards:   6%|▌         | 2/33 [00:00<00:03,  9.76it/s]
Loading checkpoint shards:  12%|█▏        | 4/33 [00:00<00:02,  9.97it/s]
Loading checkpoint shards:  15%|█▌        | 5/33 [00:00<00:02,  9.93it/s]
Loading checkpoint shards:  21%|██        | 7/33 [00:00<00:02, 10.00it/s]
Loading checkpoint shards:  24%|██▍       | 8/33 [00:00<00:02,  9.97it/s]
Loading checkpoint shards:  27%|██▋       | 9/33 [00:00<00:02,  9.55it/s]
Loading checkpoint shards:  30%|███       | 10/33 [00:01<00:02,  9.30it/s]
Loading checkpoint shards:  33%|███▎      | 11/33 [00:01<00:02,  9.34it/s]
Loading checkpoint shards:  39%|███▉      | 13/33 [00:01<00:02,  9.86it/s]
Loading checkpoint shards:  45%|████▌     | 15/33 [00:01<00:01, 10.14it/s]
Loading checkpoint shards:  52%|█████▏    | 17/33 [00:01<00:01, 10.32it/s]
Loading checkpoint shards:  58%|█████▊    | 19/33 [00:01<00:01, 10.43it/s]
Loading checkpoint shards:  64%|██████▎   | 21/33 [00:02<00:01, 10.50it/s]
Loading checkpoint shards:  70%|██████▉   | 23/33 [00:02<00:00, 10.57it/s]
Loading checkpoint shards:  76%|███████▌  | 25/33 [00:02<00:00, 10.61it/s]
Loading checkpoint shards:  82%|████████▏ | 27/33 [00:02<00:00, 10.66it/s]
Loading checkpoint shards:  88%|████████▊ | 29/33 [00:02<00:00, 10.70it/s]
Loading checkpoint shards:  94%|█████████▍| 31/33 [00:03<00:00, 10.72it/s]
Loading checkpoint shards: 100%|██████████| 33/33 [00:03<00:00, 10.65it/s]
Loading checkpoint shards: 100%|██████████| 33/33 [00:03<00:00, 10.30it/s]

Merging models:   0%|          | 0/291 [00:00<?, ?it/s]

Processing model.layers.26.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.26.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.61it/s][A

Processing model.layers.26.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.22it/s][A

                                                                                              [A
Merging models:   0%|          | 1/291 [00:00<01:19,  3.63it/s]

Processing model.layers.5.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.5.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.29it/s][A

Processing model.layers.5.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.10it/s][A

                                                                                             [A
Merging models:   1%|          | 2/291 [00:00<01:20,  3.59it/s]

Processing model.layers.16.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.16.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.78it/s][A

                                                                                                 [A
Merging models:   1%|          | 3/291 [00:00<01:08,  4.23it/s]

Processing model.layers.25.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.25.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.30it/s][A

Processing model.layers.25.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  6.75it/s][A

                                                                                              [A
Merging models:   1%|▏         | 4/291 [00:01<01:14,  3.86it/s]

Processing model.layers.18.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.18.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.84it/s][A

                                                                                                 [A
Merging models:   2%|▏         | 5/291 [00:01<01:06,  4.31it/s]

Processing model.layers.12.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.12.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.37it/s][A

Processing model.layers.12.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  9.09it/s][A

                                                                                                 [A
Merging models:   2%|▏         | 6/291 [00:01<01:04,  4.39it/s]

Processing model.layers.14.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.14.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.32it/s][A

Processing model.layers.14.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.10it/s][A

                                                                                              [A
Merging models:   2%|▏         | 7/291 [00:01<01:09,  4.08it/s]

Processing model.layers.11.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.11.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.24it/s][A

Processing model.layers.11.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.03it/s][A

                                                                                            [A
Merging models:   3%|▎         | 8/291 [00:02<01:12,  3.88it/s]

Processing model.layers.26.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.26.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.30it/s][A

Processing model.layers.26.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  5.56it/s][A

                                                                                              [A
Merging models:   3%|▎         | 9/291 [00:02<01:20,  3.50it/s]

Processing model.layers.5.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.5.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.27it/s][A

                                                                                                        [A
Merging models:   3%|▎         | 10/291 [00:02<01:10,  3.96it/s]

Processing model.layers.27.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.27.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.27it/s][A

                                                                                                [A
Merging models:   4%|▍         | 11/291 [00:02<01:04,  4.36it/s]

Processing model.layers.3.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.3.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.06it/s][A

Processing model.layers.3.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.77it/s][A

                                                                                           [A
Merging models:   4%|▍         | 12/291 [00:03<01:19,  3.49it/s]

Processing model.layers.30.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.30.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A

                                                                                                         [A
Merging models:   4%|▍         | 13/291 [00:03<01:10,  3.94it/s]

Processing model.layers.6.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.6.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.05it/s][A

Processing model.layers.6.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.78it/s][A

                                                                                             [A
Merging models:   5%|▍         | 14/291 [00:03<01:23,  3.30it/s]

Processing model.layers.13.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.13.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s][A

                                                                                                         [A
Merging models:   5%|▌         | 15/291 [00:03<01:13,  3.77it/s]

Processing model.layers.28.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.28.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.13it/s][A

Processing model.layers.28.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  8.94it/s][A

                                                                                                 [A
Merging models:   5%|▌         | 16/291 [00:04<01:09,  3.96it/s]

Processing model.layers.19.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.19.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.74it/s][A

                                                                                                 [A
Merging models:   6%|▌         | 17/291 [00:04<01:03,  4.30it/s]

Processing model.layers.25.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.25.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                                [A
Merging models:   6%|▌         | 18/291 [00:04<00:59,  4.62it/s]

Processing model.layers.20.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.20.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.99it/s][A

Processing model.layers.20.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.72it/s][A

                                                                                            [A
Merging models:   7%|▋         | 19/291 [00:04<01:15,  3.60it/s]

Processing model.layers.3.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.3.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.17it/s][A

Processing model.layers.3.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  8.94it/s][A

                                                                                                [A
Merging models:   7%|▋         | 20/291 [00:05<01:10,  3.82it/s]

Processing model.layers.28.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.28.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.02it/s][A

Processing model.layers.28.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.75it/s][A

                                                                                            [A
Merging models:   7%|▋         | 21/291 [00:05<01:23,  3.24it/s]

Processing model.layers.10.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.10.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.16it/s][A

Processing model.layers.10.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  8.98it/s][A

                                                                                                 [A
Merging models:   8%|▊         | 22/291 [00:05<01:16,  3.54it/s]

Processing model.layers.20.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.20.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.10it/s][A

Processing model.layers.20.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.79it/s][A

                                                                                              [A
Merging models:   8%|▊         | 23/291 [00:06<01:26,  3.10it/s]

Processing model.layers.23.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.23.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                                [A
Merging models:   8%|▊         | 24/291 [00:06<01:14,  3.58it/s]

Processing model.layers.8.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.8.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.70it/s][A

                                                                                                [A
Merging models:   9%|▊         | 25/291 [00:06<01:06,  3.97it/s]

Processing model.layers.3.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.3.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.73it/s][A

                                                                                                [A
Merging models:   9%|▉         | 26/291 [00:06<01:01,  4.31it/s]

Processing model.layers.7.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.7.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s][A

                                                                                                [A
Merging models:   9%|▉         | 27/291 [00:06<00:57,  4.56it/s]

Processing model.layers.25.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.25.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.17it/s][A

Processing model.layers.25.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  8.92it/s][A

                                                                                                 [A
Merging models:  10%|▉         | 28/291 [00:07<00:57,  4.54it/s]

Processing model.layers.26.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.26.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.80it/s][A

Processing model.layers.26.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  8.98it/s][A

                                                                                                 [A
Merging models:  10%|▉         | 29/291 [00:07<00:58,  4.51it/s]

Processing model.layers.17.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.17.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.68it/s][A

                                                                                                 [A
Merging models:  10%|█         | 30/291 [00:07<00:55,  4.73it/s]

Processing model.layers.13.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.13.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.34it/s][A

Processing model.layers.13.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  9.04it/s][A

                                                                                                 [A
Merging models:  11%|█         | 31/291 [00:07<00:55,  4.67it/s]

Processing model.layers.28.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.28.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.02it/s][A

Processing model.layers.28.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.75it/s][A

                                                                                              [A
Merging models:  11%|█         | 32/291 [00:08<01:11,  3.63it/s]

Processing model.layers.28.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.28.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.72it/s][A

                                                                                                 [A
Merging models:  11%|█▏        | 33/291 [00:08<01:04,  4.02it/s]

Processing model.layers.6.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.6.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A

                                                                                                        [A
Merging models:  12%|█▏        | 34/291 [00:08<00:58,  4.39it/s]

Processing model.layers.4.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.4.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.71it/s][A

                                                                                                [A
Merging models:  12%|█▏        | 35/291 [00:08<00:55,  4.64it/s]

Processing model.layers.4.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.4.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.32it/s][A

Processing model.layers.4.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  9.28it/s][A

                                                                                                [A
Merging models:  12%|█▏        | 36/291 [00:08<00:54,  4.64it/s]

Processing model.layers.15.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.15.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.09it/s][A

Processing model.layers.15.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.77it/s][A

                                                                                            [A
Merging models:  13%|█▎        | 37/291 [00:09<01:10,  3.63it/s]

Processing model.layers.19.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.19.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.29it/s][A

Processing model.layers.19.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  9.00it/s][A

                                                                                                 [A
Merging models:  13%|█▎        | 38/291 [00:09<01:05,  3.85it/s]

Processing model.layers.18.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.18.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.03it/s][A

Processing model.layers.18.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.72it/s][A

                                                                                            [A
Merging models:  13%|█▎        | 39/291 [00:10<01:17,  3.25it/s]

Processing model.layers.11.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.11.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.07it/s][A

Processing model.layers.11.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.75it/s][A

                                                                                              [A
Merging models:  14%|█▎        | 40/291 [00:10<01:25,  2.94it/s]

Processing model.layers.14.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.14.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.15it/s][A

Processing model.layers.14.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  9.09it/s][A

                                                                                                 [A
Merging models:  14%|█▍        | 41/291 [00:10<01:16,  3.28it/s]

Processing model.layers.6.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.6.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s][A

                                                                                                [A
Merging models:  14%|█▍        | 42/291 [00:10<01:07,  3.72it/s]

Processing model.layers.2.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.2.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                                        [A
Merging models:  15%|█▍        | 43/291 [00:11<01:00,  4.13it/s]

Processing model.layers.18.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.18.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s][A

                                                                                                 [A
Merging models:  15%|█▌        | 44/291 [00:11<00:55,  4.44it/s]

Processing model.layers.12.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.12.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.63it/s][A

                                                                                                 [A
Merging models:  15%|█▌        | 45/291 [00:11<00:52,  4.67it/s]

Processing model.layers.0.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.0.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.70it/s][A

                                                                                                [A
Merging models:  16%|█▌        | 46/291 [00:11<00:50,  4.85it/s]

Processing model.layers.0.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.0.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.02it/s][A

Processing model.layers.0.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.71it/s][A

                                                                                             [A
Merging models:  16%|█▌        | 47/291 [00:12<01:06,  3.70it/s]

Processing model.layers.10.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.10.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.01it/s][A

Processing model.layers.10.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.74it/s][A

                                                                                            [A
Merging models:  16%|█▋        | 48/291 [00:12<01:16,  3.17it/s]

Processing model.layers.15.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.15.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.79it/s][A

                                                                                                 [A
Merging models:  17%|█▋        | 49/291 [00:12<01:06,  3.62it/s]

Processing model.layers.24.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.24.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.74it/s][A

                                                                                                 [A
Merging models:  17%|█▋        | 50/291 [00:12<01:00,  4.01it/s]

Processing model.layers.24.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.24.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.05it/s][A

Processing model.layers.24.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  9.06it/s][A

                                                                                                 [A
Merging models:  18%|█▊        | 51/291 [00:13<00:57,  4.15it/s]

Processing model.layers.1.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.1.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                               [A
Merging models:  18%|█▊        | 52/291 [00:13<00:53,  4.50it/s]

Processing model.layers.4.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.4.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.73it/s][A

                                                                                                [A
Merging models:  18%|█▊        | 53/291 [00:13<00:50,  4.72it/s]

Processing model.layers.1.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.1.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.98it/s][A

Processing model.layers.1.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.72it/s][A

                                                                                           [A
Merging models:  19%|█▊        | 54/291 [00:13<01:05,  3.64it/s]

Processing model.layers.19.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.19.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.83it/s][A

                                                                                                 [A
Merging models:  19%|█▉        | 55/291 [00:13<00:58,  4.04it/s]

Processing model.layers.16.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.16.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                                [A
Merging models:  19%|█▉        | 56/291 [00:14<00:53,  4.41it/s]

Processing model.layers.31.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.31.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.91it/s][A

Processing model.layers.31.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.68it/s][A

                                                                                            [A
Merging models:  20%|█▉        | 57/291 [00:14<01:07,  3.49it/s]

Processing model.layers.17.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.17.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 10.91it/s][A

                                                                                                         [A
Merging models:  20%|█▉        | 58/291 [00:14<00:59,  3.91it/s]

Processing model.layers.22.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.22.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.82it/s][A

                                                                                                 [A
Merging models:  20%|██        | 59/291 [00:14<00:54,  4.26it/s]

Processing model.layers.29.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.29.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                                         [A
Merging models:  21%|██        | 60/291 [00:15<00:50,  4.59it/s]

Processing model.layers.7.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.7.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.17it/s][A

Processing model.layers.7.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  8.10it/s][A

                                                                                                [A
Merging models:  21%|██        | 61/291 [00:15<00:51,  4.44it/s]

Processing model.layers.30.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.30.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.80it/s][A

Processing model.layers.30.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.64it/s][A

                                                                                            [A
Merging models:  21%|██▏       | 62/291 [00:15<01:05,  3.49it/s]

Processing model.layers.21.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.21.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s][A

                                                                                                 [A
Merging models:  22%|██▏       | 63/291 [00:16<00:58,  3.90it/s]

Processing model.layers.26.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.26.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s][A

                                                                                                 [A
Merging models:  22%|██▏       | 64/291 [00:16<00:53,  4.24it/s]

Processing model.layers.1.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.1.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.11it/s][A

Processing model.layers.1.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.57it/s][A

                                                                                                [A
Merging models:  22%|██▏       | 65/291 [00:16<00:55,  4.11it/s]

Processing model.layers.24.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.24.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.81it/s][A

Processing model.layers.24.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.66it/s][A

                                                                                              [A
Merging models:  23%|██▎       | 66/291 [00:16<01:07,  3.35it/s]

Processing model.layers.19.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.19.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                                         [A
Merging models:  23%|██▎       | 67/291 [00:17<00:58,  3.80it/s]

Processing model.layers.25.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.25.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.79it/s][A

Processing model.layers.25.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.48it/s][A

                                                                                                 [A
Merging models:  23%|██▎       | 68/291 [00:17<00:58,  3.79it/s]

Processing model.layers.12.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.12.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.17it/s][A

Processing model.layers.12.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.62it/s][A

                                                                                                 [A
Merging models:  24%|██▎       | 69/291 [00:17<00:58,  3.81it/s]

Processing model.layers.31.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.31.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.85it/s][A

Processing model.layers.31.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.69it/s][A

                                                                                              [A
Merging models:  24%|██▍       | 70/291 [00:18<01:08,  3.21it/s]

Processing model.layers.27.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.27.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.21it/s][A

Processing model.layers.27.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.60it/s][A

                                                                                                 [A
Merging models:  24%|██▍       | 71/291 [00:18<01:05,  3.38it/s]

Processing model.layers.28.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.28.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s][A

                                                                                                [A
Merging models:  25%|██▍       | 72/291 [00:18<00:57,  3.83it/s]

Processing model.layers.17.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.17.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.92it/s][A

Processing model.layers.17.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.73it/s][A

                                                                                              [A
Merging models:  25%|██▌       | 73/291 [00:18<01:07,  3.24it/s]

Processing model.layers.7.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.7.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.27it/s][A

                                                                                                        [A
Merging models:  25%|██▌       | 74/291 [00:19<00:58,  3.71it/s]

Processing model.layers.4.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.4.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.00it/s][A

Processing model.layers.4.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.72it/s][A

                                                                                             [A
Merging models:  26%|██▌       | 75/291 [00:19<01:08,  3.18it/s]

Processing model.layers.18.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.18.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                                [A
Merging models:  26%|██▌       | 76/291 [00:19<00:58,  3.65it/s]

Processing model.layers.10.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.10.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.80it/s][A

Processing model.layers.10.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.52it/s][A

                                                                                                 [A
Merging models:  26%|██▋       | 77/291 [00:19<00:58,  3.69it/s]

Processing model.layers.6.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.6.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.49it/s][A

                                                                                                [A
Merging models:  27%|██▋       | 78/291 [00:20<00:52,  4.04it/s]

Processing model.layers.2.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.2.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.89it/s][A

Processing model.layers.2.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.61it/s][A

                                                                                                [A
Merging models:  27%|██▋       | 79/291 [00:20<00:53,  3.97it/s]

Processing model.layers.22.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.22.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.14it/s][A

Processing model.layers.22.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.63it/s][A

                                                                                                 [A
Merging models:  27%|██▋       | 80/291 [00:20<00:53,  3.93it/s]

Processing model.layers.7.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.7.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.58it/s][A

                                                                                                [A
Merging models:  28%|██▊       | 81/291 [00:20<00:49,  4.26it/s]

Processing model.layers.4.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.4.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s][A

                                                                                                        [A
Merging models:  28%|██▊       | 82/291 [00:20<00:45,  4.59it/s]

Processing model.layers.31.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.31.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.74it/s][A

                                                                                                 [A
Merging models:  29%|██▊       | 83/291 [00:21<00:43,  4.80it/s]

Processing model.layers.1.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.1.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.12it/s][A

Processing model.layers.1.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.91it/s][A

                                                                                             [A
Merging models:  29%|██▉       | 84/291 [00:21<00:55,  3.74it/s]

Processing model.layers.29.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.29.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.12it/s][A

Processing model.layers.29.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.82it/s][A

                                                                                              [A
Merging models:  29%|██▉       | 85/291 [00:22<01:04,  3.22it/s]

Processing model.layers.20.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.20.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.40it/s][A

                                                                                                 [A
Merging models:  30%|██▉       | 86/291 [00:22<00:56,  3.63it/s]

Processing model.layers.20.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.20.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A

                                                                                                [A
Merging models:  30%|██▉       | 87/291 [00:22<00:50,  4.06it/s]

Processing model.layers.22.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.22.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.14it/s][A

Processing model.layers.22.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.81it/s][A

                                                                                              [A
Merging models:  30%|███       | 88/291 [00:22<01:00,  3.38it/s]

Processing model.layers.23.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.23.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s][A

                                                                                                 [A
Merging models:  31%|███       | 89/291 [00:22<00:53,  3.80it/s]

Processing model.layers.25.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.25.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.58it/s][A

                                                                                                 [A
Merging models:  31%|███       | 90/291 [00:23<00:48,  4.15it/s]

Processing model.layers.19.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.19.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.14it/s][A

Processing model.layers.19.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.86it/s][A

                                                                                              [A
Merging models:  31%|███▏      | 91/291 [00:23<00:58,  3.43it/s]

Processing model.layers.6.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.6.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s][A

                                                                                               [A
Merging models:  32%|███▏      | 92/291 [00:23<00:51,  3.89it/s]

Processing model.layers.21.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.21.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.99it/s][A

Processing model.layers.21.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.68it/s][A

                                                                                                 [A
Merging models:  32%|███▏      | 93/291 [00:24<00:51,  3.88it/s]

Processing model.layers.0.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.0.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.85it/s][A

Processing model.layers.0.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.74it/s][A

                                                                                             [A
Merging models:  32%|███▏      | 94/291 [00:24<01:00,  3.26it/s]

Processing model.layers.8.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.8.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.60it/s][A

                                                                                                [A
Merging models:  33%|███▎      | 95/291 [00:24<00:53,  3.68it/s]

Processing model.layers.0.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.0.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.52it/s][A

Processing model.layers.0.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.86it/s][A

                                                                                                [A
Merging models:  33%|███▎      | 96/291 [00:24<00:51,  3.76it/s]

Processing model.layers.21.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.21.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s][A

                                                                                                [A
Merging models:  33%|███▎      | 97/291 [00:25<00:46,  4.18it/s]

Processing model.layers.5.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.5.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.71it/s][A

                                                                                                [A
Merging models:  34%|███▎      | 98/291 [00:25<00:43,  4.47it/s]

Processing model.layers.1.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.1.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.05it/s][A

Processing model.layers.1.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.84it/s][A

                                                                                             [A
Merging models:  34%|███▍      | 99/291 [00:25<00:53,  3.57it/s]

Processing model.layers.25.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.25.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.10it/s][A

Processing model.layers.25.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.81it/s][A

                                                                                              [A
Merging models:  34%|███▍      | 100/291 [00:26<01:01,  3.13it/s]

Processing model.layers.26.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.26.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s][A

                                                                                                [A
Merging models:  35%|███▍      | 101/291 [00:26<00:52,  3.61it/s]

Processing model.layers.31.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.31.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.44it/s][A

Processing model.layers.31.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.88it/s][A

                                                                                                 [A
Merging models:  35%|███▌      | 102/291 [00:26<00:50,  3.71it/s]

Processing model.layers.11.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.11.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s][A

                                                                                                         [A
Merging models:  35%|███▌      | 103/291 [00:26<00:45,  4.13it/s]

Processing model.layers.2.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.2.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.02it/s][A

Processing model.layers.2.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.75it/s][A

                                                                                                [A
Merging models:  36%|███▌      | 104/291 [00:26<00:46,  4.05it/s]

Processing model.layers.25.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.25.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  8.84it/s][A

                                                                                                 [A
Merging models:  36%|███▌      | 105/291 [00:27<00:44,  4.16it/s]

Processing model.layers.13.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.13.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.02it/s][A

Processing model.layers.13.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.82it/s][A

                                                                                              [A
Merging models:  36%|███▋      | 106/291 [00:27<00:54,  3.42it/s]

Processing model.layers.4.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.4.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.02it/s][A

Processing model.layers.4.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.78it/s][A

                                                                                           [A
Merging models:  37%|███▋      | 107/291 [00:27<01:00,  3.04it/s]

Processing model.layers.29.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.29.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.99it/s][A

Processing model.layers.29.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.57it/s][A

                                                                                            [A
Merging models:  37%|███▋      | 108/291 [00:28<01:05,  2.78it/s]

Processing model.layers.20.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.20.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 10.10it/s][A

                                                                                                         [A
Merging models:  37%|███▋      | 109/291 [00:28<00:56,  3.21it/s]

Processing model.layers.16.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.16.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.06it/s][A

Processing model.layers.16.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.79it/s][A

                                                                                              [A
Merging models:  38%|███▊      | 110/291 [00:29<01:02,  2.92it/s]

Processing model.layers.23.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.23.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.09it/s][A

Processing model.layers.23.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.80it/s][A

                                                                                            [A
Merging models:  38%|███▊      | 111/291 [00:29<01:05,  2.75it/s]

Processing model.layers.6.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.6.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.13it/s][A

Processing model.layers.6.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.81it/s][A

                                                                                             [A
Merging models:  38%|███▊      | 112/291 [00:29<01:07,  2.64it/s]

Processing model.layers.29.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.29.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.24it/s][A

                                                                                                [A
Merging models:  39%|███▉      | 113/291 [00:30<00:56,  3.14it/s]

Processing model.layers.12.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.12.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.09it/s][A

                                                                                                [A
Merging models:  39%|███▉      | 114/291 [00:30<00:49,  3.61it/s]

Processing model.layers.12.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.12.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.46it/s][A

                                                                                                 [A
Merging models:  40%|███▉      | 115/291 [00:30<00:44,  3.98it/s]

Processing model.layers.18.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.18.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.07it/s][A

Processing model.layers.18.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.81it/s][A

                                                                                              [A
Merging models:  40%|███▉      | 116/291 [00:30<00:52,  3.33it/s]

Processing model.layers.26.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.26.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A

                                                                                                         [A
Merging models:  40%|████      | 117/291 [00:30<00:45,  3.79it/s]

Processing model.embed_tokens.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.embed_tokens.weight:  50%|█████     | 1/2 [00:00<00:00,  2.18it/s][A

Processing model.embed_tokens.weight: 100%|██████████| 2/2 [00:00<00:00,  2.58it/s][A

                                                                                   [A
Merging models:  41%|████      | 118/291 [00:31<01:13,  2.36it/s]

Processing model.layers.13.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.13.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.73it/s][A

                                                                                                 [A
Merging models:  41%|████      | 119/291 [00:31<01:00,  2.83it/s]

Processing model.layers.10.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.10.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.16it/s][A

Processing model.layers.10.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.87it/s][A

                                                                                              [A
Merging models:  41%|████      | 120/291 [00:32<01:03,  2.71it/s]

Processing model.layers.0.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.0.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.42it/s][A

Processing model.layers.0.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.78it/s][A

                                                                                                [A
Merging models:  42%|████▏     | 121/291 [00:32<00:56,  2.99it/s]

Processing model.layers.29.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.29.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.60it/s][A

Processing model.layers.29.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.64it/s][A

                                                                                              [A
Merging models:  42%|████▏     | 122/291 [00:33<01:01,  2.75it/s]

Processing model.layers.30.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.30.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.07it/s][A

Processing model.layers.30.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.72it/s][A

                                                                                              [A
Merging models:  42%|████▏     | 123/291 [00:33<01:03,  2.63it/s]

Processing model.layers.12.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.12.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.10it/s][A

Processing model.layers.12.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.75it/s][A

                                                                                              [A
Merging models:  43%|████▎     | 124/291 [00:33<01:05,  2.55it/s]

Processing model.layers.27.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.27.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.06it/s][A

                                                                                                         [A
Merging models:  43%|████▎     | 125/291 [00:34<00:54,  3.04it/s]

Processing model.layers.22.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.22.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.06it/s][A

Processing model.layers.22.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.59it/s][A

                                                                                              [A
Merging models:  43%|████▎     | 126/291 [00:34<00:59,  2.79it/s]

Processing model.layers.28.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.28.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.07it/s][A

Processing model.layers.28.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.77it/s][A

                                                                                              [A
Merging models:  44%|████▎     | 127/291 [00:34<01:01,  2.66it/s]

Processing model.layers.29.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.29.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.64it/s][A

                                                                                                 [A
Merging models:  44%|████▍     | 128/291 [00:35<00:52,  3.13it/s]

Processing model.layers.0.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.0.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.05it/s][A

Processing model.layers.0.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.77it/s][A

                                                                                           [A
Merging models:  44%|████▍     | 129/291 [00:35<00:56,  2.87it/s]

Processing model.layers.27.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.27.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.09it/s][A

Processing model.layers.27.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.79it/s][A

                                                                                              [A
Merging models:  45%|████▍     | 130/291 [00:35<00:59,  2.71it/s]

Processing model.layers.9.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.9.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s][A

                                                                                                [A
Merging models:  45%|████▌     | 131/291 [00:36<00:50,  3.18it/s]

Processing model.layers.17.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.17.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.03it/s][A

Processing model.layers.17.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.60it/s][A

                                                                                                 [A
Merging models:  45%|████▌     | 132/291 [00:36<00:47,  3.35it/s]

Processing model.layers.22.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.22.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.63it/s][A

                                                                                                 [A
Merging models:  46%|████▌     | 133/291 [00:36<00:41,  3.77it/s]

Processing model.layers.9.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.9.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A

                                                                                                        [A
Merging models:  46%|████▌     | 134/291 [00:36<00:37,  4.17it/s]

Processing model.layers.16.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.16.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.06it/s][A

Processing model.layers.16.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.83it/s][A

                                                                                            [A
Merging models:  46%|████▋     | 135/291 [00:37<00:45,  3.43it/s]

Processing model.layers.21.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.21.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s][A

                                                                                                 [A
Merging models:  47%|████▋     | 136/291 [00:37<00:40,  3.85it/s]

Processing model.layers.8.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.8.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.02it/s][A

Processing model.layers.8.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.62it/s][A

                                                                                                [A
Merging models:  47%|████▋     | 137/291 [00:37<00:40,  3.84it/s]

Processing model.layers.31.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.31.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.91it/s][A

Processing model.layers.31.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.75it/s][A

                                                                                              [A
Merging models:  47%|████▋     | 138/291 [00:38<00:47,  3.25it/s]

Processing model.layers.7.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.7.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.11it/s][A

Processing model.layers.7.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.82it/s][A

                                                                                           [A
Merging models:  48%|████▊     | 139/291 [00:38<00:51,  2.95it/s]

Processing model.layers.15.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.15.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.12it/s][A

Processing model.layers.15.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.81it/s][A

                                                                                              [A
Merging models:  48%|████▊     | 140/291 [00:38<00:54,  2.77it/s]

Processing model.layers.19.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.19.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.07it/s][A

Processing model.layers.19.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.75it/s][A

                                                                                              [A
Merging models:  48%|████▊     | 141/291 [00:39<00:56,  2.65it/s]

Processing model.layers.20.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.20.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s][A

                                                                                                 [A
Merging models:  49%|████▉     | 142/291 [00:39<00:47,  3.11it/s]

Processing model.layers.25.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.25.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.21it/s][A

                                                                                                         [A
Merging models:  49%|████▉     | 143/291 [00:39<00:41,  3.59it/s]

Processing model.layers.26.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.26.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.50it/s][A

                                                                                                 [A
Merging models:  49%|████▉     | 144/291 [00:39<00:37,  3.96it/s]

Processing model.layers.24.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.24.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.07it/s][A

Processing model.layers.24.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.78it/s][A

                                                                                              [A
Merging models:  50%|████▉     | 145/291 [00:40<00:43,  3.32it/s]

Processing model.layers.1.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.1.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.93it/s][A

Processing model.layers.1.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.65it/s][A

                                                                                                [A
Merging models:  50%|█████     | 146/291 [00:40<00:41,  3.46it/s]

Processing model.layers.24.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.24.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.96it/s][A

Processing model.layers.24.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.64it/s][A

                                                                                                 [A
Merging models:  51%|█████     | 147/291 [00:40<00:40,  3.57it/s]

Processing model.layers.15.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.15.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.29it/s][A

Processing model.layers.15.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.79it/s][A

                                                                                                 [A
Merging models:  51%|█████     | 148/291 [00:41<00:39,  3.67it/s]

Processing model.layers.2.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.2.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.90it/s][A

Processing model.layers.2.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.76it/s][A

                                                                                           [A
Merging models:  51%|█████     | 149/291 [00:41<00:44,  3.16it/s]

Processing model.layers.10.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.10.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.60it/s][A

                                                                                                 [A
Merging models:  52%|█████▏    | 150/291 [00:41<00:39,  3.59it/s]

Processing model.layers.11.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.11.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.28it/s][A

                                                                                                [A
Merging models:  52%|█████▏    | 151/291 [00:41<00:34,  4.03it/s]

Processing model.layers.31.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.31.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.39it/s][A

Processing model.layers.31.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.81it/s][A

                                                                                                 [A
Merging models:  52%|█████▏    | 152/291 [00:42<00:34,  4.00it/s]

Processing model.layers.11.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.11.self_attn.v_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.89it/s][A

Processing model.layers.11.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.89it/s][A

                                                                                                 [A
Merging models:  53%|█████▎    | 153/291 [00:42<00:34,  3.98it/s]

Processing model.layers.23.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.23.self_attn.v_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.12it/s][A

Processing model.layers.23.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  8.37it/s][A

                                                                                                 [A
Merging models:  53%|█████▎    | 154/291 [00:42<00:33,  4.03it/s]

Processing model.layers.12.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.12.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.96it/s][A

Processing model.layers.12.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.74it/s][A

                                                                                            [A
Merging models:  53%|█████▎    | 155/291 [00:42<00:40,  3.34it/s]

Processing model.layers.22.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.22.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.98it/s][A

Processing model.layers.22.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.67it/s][A

                                                                                            [A
Merging models:  54%|█████▎    | 156/291 [00:43<00:45,  2.97it/s]

Processing model.layers.30.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.30.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.02it/s][A

                                                                                                 [A
Merging models:  54%|█████▍    | 157/291 [00:43<00:39,  3.38it/s]

Processing model.layers.2.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.2.self_attn.k_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  9.64it/s][A

Processing model.layers.2.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  8.93it/s][A

                                                                                                [A
Merging models:  54%|█████▍    | 158/291 [00:43<00:36,  3.65it/s]

Processing model.layers.8.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.8.input_layernorm.weight:  50%|█████     | 1/2 [00:00<00:00,  8.69it/s][A

Processing model.layers.8.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00,  8.62it/s][A

                                                                                               [A
Merging models:  55%|█████▍    | 159/291 [00:44<00:34,  3.83it/s]

Processing model.layers.12.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.12.post_attention_layernorm.weight:  50%|█████     | 1/2 [00:00<00:00,  8.66it/s][A

Processing model.layers.12.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00,  8.63it/s][A

                                                                                                         [A
Merging models:  55%|█████▍    | 160/291 [00:44<00:33,  3.96it/s]

Processing model.layers.17.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.17.input_layernorm.weight:  50%|█████     | 1/2 [00:00<00:00,  8.67it/s][A

Processing model.layers.17.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00,  8.65it/s][A

                                                                                                [A
Merging models:  55%|█████▌    | 161/291 [00:44<00:32,  4.06it/s]

Processing model.layers.8.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.8.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.11it/s][A

Processing model.layers.8.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.85it/s][A

                                                                                           [A
Merging models:  56%|█████▌    | 162/291 [00:44<00:38,  3.39it/s]

Processing model.layers.21.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.21.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A

                                                                                                         [A
Merging models:  56%|█████▌    | 163/291 [00:45<00:33,  3.84it/s]

Processing model.layers.24.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.24.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.21it/s][A

                                                                                                [A
Merging models:  56%|█████▋    | 164/291 [00:45<00:29,  4.24it/s]

Processing model.layers.13.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.13.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.09it/s][A

Processing model.layers.13.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.79it/s][A

                                                                                            [A
Merging models:  57%|█████▋    | 165/291 [00:45<00:36,  3.46it/s]

Processing model.layers.2.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.2.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s][A

                                                                                               [A
Merging models:  57%|█████▋    | 166/291 [00:45<00:31,  3.91it/s]

Processing model.layers.0.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.0.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.24it/s][A

                                                                                               [A
Merging models:  57%|█████▋    | 167/291 [00:46<00:28,  4.30it/s]

Processing lm_head.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing lm_head.weight:  50%|█████     | 1/2 [00:00<00:00,  2.22it/s][A

Processing lm_head.weight: 100%|██████████| 2/2 [00:00<00:00,  2.56it/s][A

                                                                        [A
Merging models:  58%|█████▊    | 168/291 [00:46<00:49,  2.48it/s]

Processing model.norm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.norm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                           [A
Merging models:  58%|█████▊    | 169/291 [00:47<00:40,  2.98it/s]

Processing model.layers.9.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.9.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.23it/s][A

Processing model.layers.9.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.89it/s][A

                                                                                             [A
Merging models:  58%|█████▊    | 170/291 [00:47<00:43,  2.80it/s]

Processing model.layers.20.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.20.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.70it/s][A

Processing model.layers.20.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.48it/s][A

                                                                                                 [A
Merging models:  59%|█████▉    | 171/291 [00:47<00:39,  3.03it/s]

Processing model.layers.15.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.15.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                                         [A
Merging models:  59%|█████▉    | 172/291 [00:47<00:33,  3.52it/s]

Processing model.layers.31.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.31.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s][A

                                                                                                 [A
Merging models:  59%|█████▉    | 173/291 [00:48<00:30,  3.92it/s]

Processing model.layers.14.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.14.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A

                                                                                                         [A
Merging models:  60%|█████▉    | 174/291 [00:48<00:27,  4.31it/s]

Processing model.layers.6.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.6.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.03it/s][A

Processing model.layers.6.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.80it/s][A

                                                                                           [A
Merging models:  60%|██████    | 175/291 [00:48<00:33,  3.49it/s]

Processing model.layers.9.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.9.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.29it/s][A

                                                                                               [A
Merging models:  60%|██████    | 176/291 [00:48<00:29,  3.94it/s]

Processing model.layers.27.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.27.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s][A

                                                                                                 [A
Merging models:  61%|██████    | 177/291 [00:49<00:26,  4.28it/s]

Processing model.layers.10.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.10.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.68it/s][A

                                                                                                 [A
Merging models:  61%|██████    | 178/291 [00:49<00:24,  4.55it/s]

Processing model.layers.11.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.11.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.00it/s][A

Processing model.layers.11.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.64it/s][A

                                                                                                 [A
Merging models:  62%|██████▏   | 179/291 [00:49<00:25,  4.31it/s]

Processing model.layers.11.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.11.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.96it/s][A

Processing model.layers.11.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.79it/s][A

                                                                                              [A
Merging models:  62%|██████▏   | 180/291 [00:49<00:31,  3.48it/s]

Processing model.layers.15.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.15.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.93it/s][A

Processing model.layers.15.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.59it/s][A

                                                                                                 [A
Merging models:  62%|██████▏   | 181/291 [00:50<00:30,  3.58it/s]

Processing model.layers.15.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.15.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.60it/s][A

                                                                                                 [A
Merging models:  63%|██████▎   | 182/291 [00:50<00:27,  3.96it/s]

Processing model.layers.31.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.31.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A

                                                                                                [A
Merging models:  63%|██████▎   | 183/291 [00:50<00:24,  4.34it/s]

Processing model.layers.7.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.7.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.38it/s][A

Processing model.layers.7.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.76it/s][A

                                                                                                [A
Merging models:  63%|██████▎   | 184/291 [00:50<00:25,  4.21it/s]

Processing model.layers.5.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.5.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.89it/s][A

Processing model.layers.5.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.68it/s][A

                                                                                                [A
Merging models:  64%|██████▎   | 185/291 [00:51<00:25,  4.09it/s]

Processing model.layers.28.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.28.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A

                                                                                                         [A
Merging models:  64%|██████▍   | 186/291 [00:51<00:23,  4.45it/s]

Processing model.layers.20.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.20.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.98it/s][A

Processing model.layers.20.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.80it/s][A

                                                                                              [A
Merging models:  64%|██████▍   | 187/291 [00:51<00:29,  3.55it/s]

Processing model.layers.16.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.16.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.04it/s][A

Processing model.layers.16.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.77it/s][A

                                                                                              [A
Merging models:  65%|██████▍   | 188/291 [00:52<00:33,  3.10it/s]

Processing model.layers.21.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.21.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.83it/s][A

Processing model.layers.21.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.51it/s][A

                                                                                                 [A
Merging models:  65%|██████▍   | 189/291 [00:52<00:31,  3.28it/s]

Processing model.layers.2.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.2.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.85it/s][A

Processing model.layers.2.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.69it/s][A

                                                                                             [A
Merging models:  65%|██████▌   | 190/291 [00:52<00:34,  2.93it/s]

Processing model.layers.3.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.3.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.81it/s][A

                                                                                                [A
Merging models:  66%|██████▌   | 191/291 [00:52<00:29,  3.40it/s]

Processing model.layers.14.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.14.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.66it/s][A

                                                                                                 [A
Merging models:  66%|██████▌   | 192/291 [00:53<00:25,  3.81it/s]

Processing model.layers.0.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.0.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.54it/s][A

                                                                                                [A
Merging models:  66%|██████▋   | 193/291 [00:53<00:23,  4.15it/s]

Processing model.layers.24.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.24.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A

                                                                                                         [A
Merging models:  67%|██████▋   | 194/291 [00:53<00:21,  4.50it/s]

Processing model.layers.23.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.23.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.11it/s][A

Processing model.layers.23.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.79it/s][A

                                                                                              [A
Merging models:  67%|██████▋   | 195/291 [00:53<00:26,  3.58it/s]

Processing model.layers.14.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.14.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s][A

                                                                                                 [A
Merging models:  67%|██████▋   | 196/291 [00:54<00:23,  3.96it/s]

Processing model.layers.6.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.6.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.31it/s][A

Processing model.layers.6.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.71it/s][A

                                                                                                [A
Merging models:  68%|██████▊   | 197/291 [00:54<00:23,  3.94it/s]

Processing model.layers.13.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.13.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.90it/s][A

Processing model.layers.13.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.70it/s][A

                                                                                              [A
Merging models:  68%|██████▊   | 198/291 [00:54<00:28,  3.28it/s]

Processing model.layers.27.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.27.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.34it/s][A

Processing model.layers.27.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.71it/s][A

                                                                                                 [A
Merging models:  68%|██████▊   | 199/291 [00:55<00:26,  3.44it/s]

Processing model.layers.29.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.29.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.32it/s][A

Processing model.layers.29.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.67it/s][A

                                                                                                 [A
Merging models:  69%|██████▊   | 200/291 [00:55<00:25,  3.56it/s]

Processing model.layers.16.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.16.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.84it/s][A

Processing model.layers.16.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.53it/s][A

                                                                                                 [A
Merging models:  69%|██████▉   | 201/291 [00:55<00:24,  3.63it/s]

Processing model.layers.27.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.27.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.56it/s][A

                                                                                                 [A
Merging models:  69%|██████▉   | 202/291 [00:55<00:22,  4.00it/s]

Processing model.layers.14.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.14.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.87it/s][A

Processing model.layers.14.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.57it/s][A

                                                                                                 [A
Merging models:  70%|██████▉   | 203/291 [00:56<00:22,  3.94it/s]

Processing model.layers.30.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.30.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s][A

                                                                                                [A
Merging models:  70%|███████   | 204/291 [00:56<00:20,  4.32it/s]

Processing model.layers.3.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.3.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s][A

                                                                                               [A
Merging models:  70%|███████   | 205/291 [00:56<00:18,  4.63it/s]

Processing model.layers.30.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.30.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.42it/s][A

Processing model.layers.30.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.71it/s][A

                                                                                                 [A
Merging models:  71%|███████   | 206/291 [00:56<00:19,  4.38it/s]

Processing model.layers.23.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.23.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.85it/s][A

Processing model.layers.23.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.73it/s][A

                                                                                              [A
Merging models:  71%|███████   | 207/291 [00:57<00:24,  3.49it/s]

Processing model.layers.17.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.17.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.31it/s][A

Processing model.layers.17.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.69it/s][A

                                                                                                 [A
Merging models:  71%|███████▏  | 208/291 [00:57<00:23,  3.60it/s]

Processing model.layers.8.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.8.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.91it/s][A

Processing model.layers.8.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.71it/s][A

                                                                                             [A
Merging models:  72%|███████▏  | 209/291 [00:57<00:26,  3.11it/s]

Processing model.layers.17.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.17.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.66it/s][A

                                                                                                 [A
Merging models:  72%|███████▏  | 210/291 [00:57<00:22,  3.56it/s]

Processing model.layers.1.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.1.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A

                                                                                                        [A
Merging models:  73%|███████▎  | 211/291 [00:58<00:20,  3.99it/s]

Processing model.layers.11.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.11.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.46it/s][A

                                                                                                 [A
Merging models:  73%|███████▎  | 212/291 [00:58<00:18,  4.29it/s]

Processing model.layers.28.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.28.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.58it/s][A

                                                                                                 [A
Merging models:  73%|███████▎  | 213/291 [00:58<00:17,  4.55it/s]

Processing model.layers.4.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.4.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A

                                                                                               [A
Merging models:  74%|███████▎  | 214/291 [00:58<00:15,  4.82it/s]

Processing model.layers.21.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.21.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.10it/s][A

Processing model.layers.21.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.79it/s][A

                                                                                              [A
Merging models:  74%|███████▍  | 215/291 [00:59<00:20,  3.71it/s]

Processing model.layers.30.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.30.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.50it/s][A

                                                                                                 [A
Merging models:  74%|███████▍  | 216/291 [00:59<00:18,  4.06it/s]

Processing model.layers.3.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.3.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.14it/s][A

                                                                                                        [A
Merging models:  75%|███████▍  | 217/291 [00:59<00:16,  4.42it/s]

Processing model.layers.19.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.19.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.18it/s][A

                                                                                                [A
Merging models:  75%|███████▍  | 218/291 [00:59<00:15,  4.71it/s]

Processing model.layers.9.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.9.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.82it/s][A

Processing model.layers.9.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.51it/s][A

                                                                                                [A
Merging models:  75%|███████▌  | 219/291 [00:59<00:16,  4.38it/s]

Processing model.layers.9.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.9.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.90it/s][A

Processing model.layers.9.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.77it/s][A

                                                                                             [A
Merging models:  76%|███████▌  | 220/291 [01:00<00:20,  3.51it/s]

Processing model.layers.30.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.30.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.04it/s][A

Processing model.layers.30.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.73it/s][A

                                                                                              [A
Merging models:  76%|███████▌  | 221/291 [01:00<00:22,  3.07it/s]

Processing model.layers.6.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.6.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.17it/s][A

Processing model.layers.6.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.60it/s][A

                                                                                                [A
Merging models:  76%|███████▋  | 222/291 [01:00<00:21,  3.27it/s]

Processing model.layers.27.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.27.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.85it/s][A

Processing model.layers.27.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.70it/s][A

                                                                                              [A
Merging models:  77%|███████▋  | 223/291 [01:01<00:23,  2.93it/s]

Processing model.layers.17.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.17.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.03it/s][A

Processing model.layers.17.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.75it/s][A

                                                                                              [A
Merging models:  77%|███████▋  | 224/291 [01:01<00:24,  2.74it/s]

Processing model.layers.26.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.26.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.97it/s][A

Processing model.layers.26.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.72it/s][A

                                                                                            [A
Merging models:  77%|███████▋  | 225/291 [01:02<00:25,  2.62it/s]

Processing model.layers.16.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.16.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.18it/s][A

                                                                                                         [A
Merging models:  78%|███████▊  | 226/291 [01:02<00:20,  3.12it/s]

Processing model.layers.10.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.10.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.02it/s][A

Processing model.layers.10.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.70it/s][A

                                                                                              [A
Merging models:  78%|███████▊  | 227/291 [01:02<00:22,  2.85it/s]

Processing model.layers.23.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.23.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.72it/s][A

Processing model.layers.23.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.37it/s][A

                                                                                                 [A
Merging models:  78%|███████▊  | 228/291 [01:03<00:20,  3.06it/s]

Processing model.layers.3.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.3.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.81it/s][A

Processing model.layers.3.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.62it/s][A

                                                                                             [A
Merging models:  79%|███████▊  | 229/291 [01:03<00:22,  2.79it/s]

Processing model.layers.24.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.24.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.98it/s][A

Processing model.layers.24.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.67it/s][A

                                                                                            [A
Merging models:  79%|███████▉  | 230/291 [01:03<00:23,  2.65it/s]

Processing model.layers.14.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.14.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.19it/s][A

                                                                                                [A
Merging models:  79%|███████▉  | 231/291 [01:04<00:19,  3.14it/s]

Processing model.layers.23.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.23.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.12it/s][A

                                                                                                         [A
Merging models:  80%|███████▉  | 232/291 [01:04<00:16,  3.61it/s]

Processing model.layers.10.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.10.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A

                                                                                                         [A
Merging models:  80%|████████  | 233/291 [01:04<00:14,  4.04it/s]

Processing model.layers.11.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.11.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.72it/s][A

Processing model.layers.11.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.38it/s][A

                                                                                                 [A
Merging models:  80%|████████  | 234/291 [01:04<00:14,  3.94it/s]

Processing model.layers.7.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.7.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.15it/s][A

                                                                                               [A
Merging models:  81%|████████  | 235/291 [01:04<00:12,  4.31it/s]

Processing model.layers.14.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.14.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.84it/s][A

Processing model.layers.14.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.65it/s][A

                                                                                            [A
Merging models:  81%|████████  | 236/291 [01:05<00:15,  3.44it/s]

Processing model.layers.5.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.5.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.52it/s][A

                                                                                                [A
Merging models:  81%|████████▏ | 237/291 [01:05<00:14,  3.84it/s]

Processing model.layers.14.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.14.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.05it/s][A

Processing model.layers.14.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.72it/s][A

                                                                                              [A
Merging models:  82%|████████▏ | 238/291 [01:06<00:16,  3.24it/s]

Processing model.layers.0.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.0.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.18it/s][A

                                                                                                        [A
Merging models:  82%|████████▏ | 239/291 [01:06<00:14,  3.71it/s]

Processing model.layers.17.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.17.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.98it/s][A

Processing model.layers.17.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.69it/s][A

                                                                                            [A
Merging models:  82%|████████▏ | 240/291 [01:06<00:16,  3.17it/s]

Processing model.layers.25.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.25.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.99it/s][A

Processing model.layers.25.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.68it/s][A

                                                                                            [A
Merging models:  83%|████████▎ | 241/291 [01:07<00:17,  2.87it/s]

Processing model.layers.5.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.5.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.96it/s][A

Processing model.layers.5.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.67it/s][A

                                                                                           [A
Merging models:  83%|████████▎ | 242/291 [01:07<00:18,  2.69it/s]

Processing model.layers.18.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.18.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.96it/s][A

Processing model.layers.18.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.67it/s][A

                                                                                              [A
Merging models:  84%|████████▎ | 243/291 [01:07<00:18,  2.58it/s]

Processing model.layers.22.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.22.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.76it/s][A

Processing model.layers.22.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.37it/s][A

                                                                                                 [A
Merging models:  84%|████████▍ | 244/291 [01:08<00:16,  2.84it/s]

Processing model.layers.10.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.10.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 10.99it/s][A

                                                                                                [A
Merging models:  84%|████████▍ | 245/291 [01:08<00:13,  3.32it/s]

Processing model.layers.19.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.19.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.84it/s][A

Processing model.layers.19.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.64it/s][A

                                                                                            [A
Merging models:  85%|████████▍ | 246/291 [01:08<00:15,  2.95it/s]

Processing model.layers.21.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.21.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.86it/s][A

Processing model.layers.21.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.57it/s][A

                                                                                              [A
Merging models:  85%|████████▍ | 247/291 [01:09<00:16,  2.72it/s]

Processing model.layers.19.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.19.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.10it/s][A

Processing model.layers.19.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.45it/s][A

                                                                                                 [A
Merging models:  85%|████████▌ | 248/291 [01:09<00:14,  2.97it/s]

Processing model.layers.1.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.1.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.45it/s][A

                                                                                                [A
Merging models:  86%|████████▌ | 249/291 [01:09<00:12,  3.41it/s]

Processing model.layers.9.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.9.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.36it/s][A

                                                                                                [A
Merging models:  86%|████████▌ | 250/291 [01:09<00:10,  3.79it/s]

Processing model.layers.24.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.24.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.45it/s][A

                                                                                                 [A
Merging models:  86%|████████▋ | 251/291 [01:10<00:09,  4.13it/s]

Processing model.layers.8.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.8.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.10it/s][A

Processing model.layers.8.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.72it/s][A

                                                                                             [A
Merging models:  87%|████████▋ | 252/291 [01:10<00:11,  3.39it/s]

Processing model.layers.8.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.8.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.14it/s][A

                                                                                                        [A
Merging models:  87%|████████▋ | 253/291 [01:10<00:09,  3.84it/s]

Processing model.layers.28.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.28.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.57it/s][A

Processing model.layers.28.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.28it/s][A

                                                                                                 [A
Merging models:  87%|████████▋ | 254/291 [01:10<00:09,  3.78it/s]

Processing model.layers.1.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.1.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  9.52it/s][A

                                                                                                [A
Merging models:  88%|████████▊ | 255/291 [01:11<00:08,  4.03it/s]

Processing model.layers.13.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.13.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.71it/s][A

Processing model.layers.13.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.39it/s][A

                                                                                                 [A
Merging models:  88%|████████▊ | 256/291 [01:11<00:08,  3.93it/s]

Processing model.layers.9.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.9.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.58it/s][A

Processing model.layers.9.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.32it/s][A

                                                                                                [A
Merging models:  88%|████████▊ | 257/291 [01:11<00:08,  3.85it/s]

Processing model.layers.5.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.5.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  3.27it/s][A

Processing model.layers.5.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  3.87it/s][A

                                                                                             [A
Merging models:  89%|████████▊ | 258/291 [01:12<00:11,  2.93it/s]

Processing model.layers.30.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.30.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.66it/s][A

Processing model.layers.30.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.35it/s][A

                                                                                                 [A
Merging models:  89%|████████▉ | 259/291 [01:12<00:10,  3.12it/s]

Processing model.layers.5.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.5.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.15it/s][A

                                                                                               [A
Merging models:  89%|████████▉ | 260/291 [01:12<00:08,  3.60it/s]

Processing model.layers.18.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.18.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.56it/s][A

Processing model.layers.18.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  6.63it/s][A

                                                                                                 [A
Merging models:  90%|████████▉ | 261/291 [01:12<00:08,  3.52it/s]

Processing model.layers.20.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.20.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.06it/s][A

Processing model.layers.20.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.11it/s][A

                                                                                                 [A
Merging models:  90%|█████████ | 262/291 [01:13<00:08,  3.53it/s]

Processing model.layers.21.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.21.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.76it/s][A

Processing model.layers.21.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.55it/s][A

                                                                                            [A
Merging models:  90%|█████████ | 263/291 [01:13<00:09,  3.04it/s]

Processing model.layers.22.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.22.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.16it/s][A

                                                                                                         [A
Merging models:  91%|█████████ | 264/291 [01:13<00:07,  3.52it/s]

Processing model.layers.4.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.4.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.88it/s][A

Processing model.layers.4.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  3.50it/s][A

                                                                                             [A
Merging models:  91%|█████████ | 265/291 [01:14<00:09,  2.75it/s]

Processing model.layers.18.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.18.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.19it/s][A

                                                                                                         [A
Merging models:  91%|█████████▏| 266/291 [01:14<00:07,  3.25it/s]

Processing model.layers.22.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.22.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s][A

                                                                                                [A
Merging models:  92%|█████████▏| 267/291 [01:14<00:06,  3.71it/s]

Processing model.layers.16.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.16.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.65it/s][A

Processing model.layers.16.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.29it/s][A

                                                                                                 [A
Merging models:  92%|█████████▏| 268/291 [01:15<00:06,  3.70it/s]

Processing model.layers.7.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.7.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.65it/s][A

Processing model.layers.7.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.50it/s][A

                                                                                             [A
Merging models:  92%|█████████▏| 269/291 [01:15<00:07,  3.10it/s]

Processing model.layers.13.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.13.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.56it/s][A

                                                                                                 [A
Merging models:  93%|█████████▎| 270/291 [01:15<00:05,  3.54it/s]

Processing model.layers.29.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.29.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.60it/s][A

Processing model.layers.29.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.24it/s][A

                                                                                                 [A
Merging models:  93%|█████████▎| 271/291 [01:15<00:05,  3.57it/s]

Processing model.layers.8.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.8.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.66it/s][A

Processing model.layers.8.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.36it/s][A

                                                                                                [A
Merging models:  93%|█████████▎| 272/291 [01:16<00:05,  3.61it/s]

Processing model.layers.27.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.27.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.79it/s][A

Processing model.layers.27.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.62it/s][A

                                                                                            [A
Merging models:  94%|█████████▍| 273/291 [01:16<00:05,  3.09it/s]

Processing model.layers.13.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.13.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s][A

                                                                                                [A
Merging models:  94%|█████████▍| 274/291 [01:16<00:04,  3.57it/s]

Processing model.layers.16.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.16.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.54it/s][A

                                                                                                 [A
Merging models:  95%|█████████▍| 275/291 [01:16<00:04,  3.95it/s]

Processing model.layers.15.input_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.15.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A

                                                                                                [A
Merging models:  95%|█████████▍| 276/291 [01:17<00:03,  4.33it/s]

Processing model.layers.2.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.2.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.95it/s][A

Processing model.layers.2.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  3.44it/s][A

                                                                                             [A
Merging models:  95%|█████████▌| 277/291 [01:17<00:04,  3.05it/s]

Processing model.layers.3.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.3.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.81it/s][A

Processing model.layers.3.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.64it/s][A

                                                                                             [A
Merging models:  96%|█████████▌| 278/291 [01:18<00:04,  2.79it/s]

Processing model.layers.4.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.4.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.74it/s][A

Processing model.layers.4.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.37it/s][A

                                                                                                [A
Merging models:  96%|█████████▌| 279/291 [01:18<00:03,  3.01it/s]

Processing model.layers.23.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.23.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.63it/s][A

Processing model.layers.23.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.34it/s][A

                                                                                                 [A
Merging models:  96%|█████████▌| 280/291 [01:18<00:03,  3.19it/s]

Processing model.layers.31.post_attention_layernorm.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.31.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A

                                                                                                         [A
Merging models:  97%|█████████▋| 281/291 [01:18<00:02,  3.66it/s]

Processing model.layers.12.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.12.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.81it/s][A

Processing model.layers.12.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.60it/s][A

                                                                                              [A
Merging models:  97%|█████████▋| 282/291 [01:19<00:02,  3.12it/s]

Processing model.layers.7.mlp.down_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.7.mlp.down_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.89it/s][A

Processing model.layers.7.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.64it/s][A

                                                                                             [A
Merging models:  97%|█████████▋| 283/291 [01:19<00:02,  2.83it/s]

Processing model.layers.9.mlp.up_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.9.mlp.up_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.94it/s][A

Processing model.layers.9.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  3.43it/s][A

                                                                                           [A
Merging models:  98%|█████████▊| 284/291 [01:20<00:02,  2.41it/s]

Processing model.layers.15.mlp.gate_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.15.mlp.gate_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  4.71it/s][A

Processing model.layers.15.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  4.35it/s][A

                                                                                              [A
Merging models:  98%|█████████▊| 285/291 [01:20<00:02,  2.34it/s]

Processing model.layers.26.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.26.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  5.75it/s][A

Processing model.layers.26.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  6.51it/s][A

                                                                                                 [A
Merging models:  98%|█████████▊| 286/291 [01:21<00:01,  2.55it/s]

Processing model.layers.29.self_attn.k_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.29.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.51it/s][A

                                                                                                 [A
Merging models:  99%|█████████▊| 287/291 [01:21<00:01,  3.01it/s]

Processing model.layers.5.self_attn.q_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.5.self_attn.q_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.60it/s][A

Processing model.layers.5.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.29it/s][A

                                                                                                [A
Merging models:  99%|█████████▉| 288/291 [01:21<00:00,  3.18it/s]

Processing model.layers.18.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.18.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  8.02it/s][A

Processing model.layers.18.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.48it/s][A

                                                                                                 [A
Merging models:  99%|█████████▉| 289/291 [01:21<00:00,  3.34it/s]

Processing model.layers.2.self_attn.v_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.2.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.27it/s][A

                                                                                                [A
Merging models: 100%|█████████▉| 290/291 [01:21<00:00,  3.73it/s]

Processing model.layers.3.self_attn.o_proj.weight:   0%|          | 0/2 [00:00<?, ?it/s][A

Processing model.layers.3.self_attn.o_proj.weight:  50%|█████     | 1/2 [00:00<00:00,  7.83it/s][A

Processing model.layers.3.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00,  7.56it/s][A

                                                                                                [A
Merging models: 100%|██████████| 291/291 [01:22<00:00,  3.75it/s]
Merging models: 100%|██████████| 291/291 [01:22<00:00,  3.54it/s]
create a temporary file to store mixed weights: /tmp/tmpp5xfq8r2.ckpt
***weight for each model***:
/media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/Mistral-7B-Instruct-v0.2 0.5
/media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/xDAN-L1-Chat-RL-v1 0.5

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.embed_tokens.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '

Loading checkpoint shards:  33%|███▎      | 1/3 [00:00<00:00,  9.42it/s]/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '

Loading checkpoint shards:  67%|██████▋   | 2/3 [00:00<00:00,  9.31it/s]/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.norm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for lm_head.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '

Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  9.45it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  9.42it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Remove temporary file: /tmp/tmpp5xfq8r2.ckpt
Remove temporary directory: /tmp/tmpc0jeswnf
Saving the new model to ./mixed_llm
MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )
    (norm): MistralRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)