1667 lines
281 KiB
Plaintext
1667 lines
281 KiB
Plaintext
create a temporary directory: /tmp/tmpc0jeswnf
|
||
loading /media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/Mistral-7B-Instruct-v0.2 -----------------
|
||
|
||
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
|
||
Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:01, 1.64it/s]
|
||
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:01<00:00, 1.63it/s]
|
||
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00, 1.69it/s]
|
||
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00, 1.67it/s]
|
||
loading /media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/xDAN-L1-Chat-RL-v1 -----------------
|
||
|
||
Loading checkpoint shards: 0%| | 0/33 [00:00<?, ?it/s]
|
||
Loading checkpoint shards: 3%|▎ | 1/33 [00:00<00:03, 9.51it/s]
|
||
Loading checkpoint shards: 6%|▌ | 2/33 [00:00<00:03, 9.76it/s]
|
||
Loading checkpoint shards: 12%|█▏ | 4/33 [00:00<00:02, 9.97it/s]
|
||
Loading checkpoint shards: 15%|█▌ | 5/33 [00:00<00:02, 9.93it/s]
|
||
Loading checkpoint shards: 21%|██ | 7/33 [00:00<00:02, 10.00it/s]
|
||
Loading checkpoint shards: 24%|██▍ | 8/33 [00:00<00:02, 9.97it/s]
|
||
Loading checkpoint shards: 27%|██▋ | 9/33 [00:00<00:02, 9.55it/s]
|
||
Loading checkpoint shards: 30%|███ | 10/33 [00:01<00:02, 9.30it/s]
|
||
Loading checkpoint shards: 33%|███▎ | 11/33 [00:01<00:02, 9.34it/s]
|
||
Loading checkpoint shards: 39%|███▉ | 13/33 [00:01<00:02, 9.86it/s]
|
||
Loading checkpoint shards: 45%|████▌ | 15/33 [00:01<00:01, 10.14it/s]
|
||
Loading checkpoint shards: 52%|█████▏ | 17/33 [00:01<00:01, 10.32it/s]
|
||
Loading checkpoint shards: 58%|█████▊ | 19/33 [00:01<00:01, 10.43it/s]
|
||
Loading checkpoint shards: 64%|██████▎ | 21/33 [00:02<00:01, 10.50it/s]
|
||
Loading checkpoint shards: 70%|██████▉ | 23/33 [00:02<00:00, 10.57it/s]
|
||
Loading checkpoint shards: 76%|███████▌ | 25/33 [00:02<00:00, 10.61it/s]
|
||
Loading checkpoint shards: 82%|████████▏ | 27/33 [00:02<00:00, 10.66it/s]
|
||
Loading checkpoint shards: 88%|████████▊ | 29/33 [00:02<00:00, 10.70it/s]
|
||
Loading checkpoint shards: 94%|█████████▍| 31/33 [00:03<00:00, 10.72it/s]
|
||
Loading checkpoint shards: 100%|██████████| 33/33 [00:03<00:00, 10.65it/s]
|
||
Loading checkpoint shards: 100%|██████████| 33/33 [00:03<00:00, 10.30it/s]
|
||
|
||
Merging models: 0%| | 0/291 [00:00<?, ?it/s]
|
||
|
||
Processing model.layers.26.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.26.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.61it/s][A
|
||
|
||
Processing model.layers.26.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.22it/s][A
|
||
|
||
[A
|
||
Merging models: 0%| | 1/291 [00:00<01:19, 3.63it/s]
|
||
|
||
Processing model.layers.5.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.5.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.29it/s][A
|
||
|
||
Processing model.layers.5.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.10it/s][A
|
||
|
||
[A
|
||
Merging models: 1%| | 2/291 [00:00<01:20, 3.59it/s]
|
||
|
||
Processing model.layers.16.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.16.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.78it/s][A
|
||
|
||
[A
|
||
Merging models: 1%| | 3/291 [00:00<01:08, 4.23it/s]
|
||
|
||
Processing model.layers.25.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.25.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.30it/s][A
|
||
|
||
Processing model.layers.25.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 6.75it/s][A
|
||
|
||
[A
|
||
Merging models: 1%|▏ | 4/291 [00:01<01:14, 3.86it/s]
|
||
|
||
Processing model.layers.18.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.18.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.84it/s][A
|
||
|
||
[A
|
||
Merging models: 2%|▏ | 5/291 [00:01<01:06, 4.31it/s]
|
||
|
||
Processing model.layers.12.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.12.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.37it/s][A
|
||
|
||
Processing model.layers.12.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.09it/s][A
|
||
|
||
[A
|
||
Merging models: 2%|▏ | 6/291 [00:01<01:04, 4.39it/s]
|
||
|
||
Processing model.layers.14.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.14.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.32it/s][A
|
||
|
||
Processing model.layers.14.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.10it/s][A
|
||
|
||
[A
|
||
Merging models: 2%|▏ | 7/291 [00:01<01:09, 4.08it/s]
|
||
|
||
Processing model.layers.11.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.11.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.24it/s][A
|
||
|
||
Processing model.layers.11.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.03it/s][A
|
||
|
||
[A
|
||
Merging models: 3%|▎ | 8/291 [00:02<01:12, 3.88it/s]
|
||
|
||
Processing model.layers.26.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.26.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.30it/s][A
|
||
|
||
Processing model.layers.26.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 5.56it/s][A
|
||
|
||
[A
|
||
Merging models: 3%|▎ | 9/291 [00:02<01:20, 3.50it/s]
|
||
|
||
Processing model.layers.5.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.5.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.27it/s][A
|
||
|
||
[A
|
||
Merging models: 3%|▎ | 10/291 [00:02<01:10, 3.96it/s]
|
||
|
||
Processing model.layers.27.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.27.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.27it/s][A
|
||
|
||
[A
|
||
Merging models: 4%|▍ | 11/291 [00:02<01:04, 4.36it/s]
|
||
|
||
Processing model.layers.3.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.3.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.06it/s][A
|
||
|
||
Processing model.layers.3.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s][A
|
||
|
||
[A
|
||
Merging models: 4%|▍ | 12/291 [00:03<01:19, 3.49it/s]
|
||
|
||
Processing model.layers.30.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.30.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A
|
||
|
||
[A
|
||
Merging models: 4%|▍ | 13/291 [00:03<01:10, 3.94it/s]
|
||
|
||
Processing model.layers.6.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.6.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.05it/s][A
|
||
|
||
Processing model.layers.6.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.78it/s][A
|
||
|
||
[A
|
||
Merging models: 5%|▍ | 14/291 [00:03<01:23, 3.30it/s]
|
||
|
||
Processing model.layers.13.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.13.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s][A
|
||
|
||
[A
|
||
Merging models: 5%|▌ | 15/291 [00:03<01:13, 3.77it/s]
|
||
|
||
Processing model.layers.28.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.28.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.13it/s][A
|
||
|
||
Processing model.layers.28.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.94it/s][A
|
||
|
||
[A
|
||
Merging models: 5%|▌ | 16/291 [00:04<01:09, 3.96it/s]
|
||
|
||
Processing model.layers.19.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.19.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.74it/s][A
|
||
|
||
[A
|
||
Merging models: 6%|▌ | 17/291 [00:04<01:03, 4.30it/s]
|
||
|
||
Processing model.layers.25.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.25.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 6%|▌ | 18/291 [00:04<00:59, 4.62it/s]
|
||
|
||
Processing model.layers.20.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.20.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.99it/s][A
|
||
|
||
Processing model.layers.20.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s][A
|
||
|
||
[A
|
||
Merging models: 7%|▋ | 19/291 [00:04<01:15, 3.60it/s]
|
||
|
||
Processing model.layers.3.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.3.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.17it/s][A
|
||
|
||
Processing model.layers.3.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.94it/s][A
|
||
|
||
[A
|
||
Merging models: 7%|▋ | 20/291 [00:05<01:10, 3.82it/s]
|
||
|
||
Processing model.layers.28.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.28.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s][A
|
||
|
||
Processing model.layers.28.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s][A
|
||
|
||
[A
|
||
Merging models: 7%|▋ | 21/291 [00:05<01:23, 3.24it/s]
|
||
|
||
Processing model.layers.10.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.10.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.16it/s][A
|
||
|
||
Processing model.layers.10.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.98it/s][A
|
||
|
||
[A
|
||
Merging models: 8%|▊ | 22/291 [00:05<01:16, 3.54it/s]
|
||
|
||
Processing model.layers.20.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.20.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s][A
|
||
|
||
Processing model.layers.20.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s][A
|
||
|
||
[A
|
||
Merging models: 8%|▊ | 23/291 [00:06<01:26, 3.10it/s]
|
||
|
||
Processing model.layers.23.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.23.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 8%|▊ | 24/291 [00:06<01:14, 3.58it/s]
|
||
|
||
Processing model.layers.8.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.8.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.70it/s][A
|
||
|
||
[A
|
||
Merging models: 9%|▊ | 25/291 [00:06<01:06, 3.97it/s]
|
||
|
||
Processing model.layers.3.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.3.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.73it/s][A
|
||
|
||
[A
|
||
Merging models: 9%|▉ | 26/291 [00:06<01:01, 4.31it/s]
|
||
|
||
Processing model.layers.7.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.7.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s][A
|
||
|
||
[A
|
||
Merging models: 9%|▉ | 27/291 [00:06<00:57, 4.56it/s]
|
||
|
||
Processing model.layers.25.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.25.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.17it/s][A
|
||
|
||
Processing model.layers.25.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.92it/s][A
|
||
|
||
[A
|
||
Merging models: 10%|▉ | 28/291 [00:07<00:57, 4.54it/s]
|
||
|
||
Processing model.layers.26.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.26.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.80it/s][A
|
||
|
||
Processing model.layers.26.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.98it/s][A
|
||
|
||
[A
|
||
Merging models: 10%|▉ | 29/291 [00:07<00:58, 4.51it/s]
|
||
|
||
Processing model.layers.17.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.17.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.68it/s][A
|
||
|
||
[A
|
||
Merging models: 10%|█ | 30/291 [00:07<00:55, 4.73it/s]
|
||
|
||
Processing model.layers.13.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.13.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.34it/s][A
|
||
|
||
Processing model.layers.13.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.04it/s][A
|
||
|
||
[A
|
||
Merging models: 11%|█ | 31/291 [00:07<00:55, 4.67it/s]
|
||
|
||
Processing model.layers.28.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.28.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s][A
|
||
|
||
Processing model.layers.28.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s][A
|
||
|
||
[A
|
||
Merging models: 11%|█ | 32/291 [00:08<01:11, 3.63it/s]
|
||
|
||
Processing model.layers.28.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.28.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.72it/s][A
|
||
|
||
[A
|
||
Merging models: 11%|█▏ | 33/291 [00:08<01:04, 4.02it/s]
|
||
|
||
Processing model.layers.6.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.6.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A
|
||
|
||
[A
|
||
Merging models: 12%|█▏ | 34/291 [00:08<00:58, 4.39it/s]
|
||
|
||
Processing model.layers.4.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.4.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.71it/s][A
|
||
|
||
[A
|
||
Merging models: 12%|█▏ | 35/291 [00:08<00:55, 4.64it/s]
|
||
|
||
Processing model.layers.4.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.4.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.32it/s][A
|
||
|
||
Processing model.layers.4.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.28it/s][A
|
||
|
||
[A
|
||
Merging models: 12%|█▏ | 36/291 [00:08<00:54, 4.64it/s]
|
||
|
||
Processing model.layers.15.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.15.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.09it/s][A
|
||
|
||
Processing model.layers.15.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s][A
|
||
|
||
[A
|
||
Merging models: 13%|█▎ | 37/291 [00:09<01:10, 3.63it/s]
|
||
|
||
Processing model.layers.19.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.19.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.29it/s][A
|
||
|
||
Processing model.layers.19.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.00it/s][A
|
||
|
||
[A
|
||
Merging models: 13%|█▎ | 38/291 [00:09<01:05, 3.85it/s]
|
||
|
||
Processing model.layers.18.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.18.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.03it/s][A
|
||
|
||
Processing model.layers.18.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s][A
|
||
|
||
[A
|
||
Merging models: 13%|█▎ | 39/291 [00:10<01:17, 3.25it/s]
|
||
|
||
Processing model.layers.11.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.11.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s][A
|
||
|
||
Processing model.layers.11.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s][A
|
||
|
||
[A
|
||
Merging models: 14%|█▎ | 40/291 [00:10<01:25, 2.94it/s]
|
||
|
||
Processing model.layers.14.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.14.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.15it/s][A
|
||
|
||
Processing model.layers.14.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.09it/s][A
|
||
|
||
[A
|
||
Merging models: 14%|█▍ | 41/291 [00:10<01:16, 3.28it/s]
|
||
|
||
Processing model.layers.6.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.6.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s][A
|
||
|
||
[A
|
||
Merging models: 14%|█▍ | 42/291 [00:10<01:07, 3.72it/s]
|
||
|
||
Processing model.layers.2.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.2.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 15%|█▍ | 43/291 [00:11<01:00, 4.13it/s]
|
||
|
||
Processing model.layers.18.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.18.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s][A
|
||
|
||
[A
|
||
Merging models: 15%|█▌ | 44/291 [00:11<00:55, 4.44it/s]
|
||
|
||
Processing model.layers.12.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.12.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.63it/s][A
|
||
|
||
[A
|
||
Merging models: 15%|█▌ | 45/291 [00:11<00:52, 4.67it/s]
|
||
|
||
Processing model.layers.0.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.0.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.70it/s][A
|
||
|
||
[A
|
||
Merging models: 16%|█▌ | 46/291 [00:11<00:50, 4.85it/s]
|
||
|
||
Processing model.layers.0.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.0.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s][A
|
||
|
||
Processing model.layers.0.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.71it/s][A
|
||
|
||
[A
|
||
Merging models: 16%|█▌ | 47/291 [00:12<01:06, 3.70it/s]
|
||
|
||
Processing model.layers.10.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.10.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.01it/s][A
|
||
|
||
Processing model.layers.10.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.74it/s][A
|
||
|
||
[A
|
||
Merging models: 16%|█▋ | 48/291 [00:12<01:16, 3.17it/s]
|
||
|
||
Processing model.layers.15.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.15.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.79it/s][A
|
||
|
||
[A
|
||
Merging models: 17%|█▋ | 49/291 [00:12<01:06, 3.62it/s]
|
||
|
||
Processing model.layers.24.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.24.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.74it/s][A
|
||
|
||
[A
|
||
Merging models: 17%|█▋ | 50/291 [00:12<01:00, 4.01it/s]
|
||
|
||
Processing model.layers.24.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.24.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.05it/s][A
|
||
|
||
Processing model.layers.24.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.06it/s][A
|
||
|
||
[A
|
||
Merging models: 18%|█▊ | 51/291 [00:13<00:57, 4.15it/s]
|
||
|
||
Processing model.layers.1.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.1.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 18%|█▊ | 52/291 [00:13<00:53, 4.50it/s]
|
||
|
||
Processing model.layers.4.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.4.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.73it/s][A
|
||
|
||
[A
|
||
Merging models: 18%|█▊ | 53/291 [00:13<00:50, 4.72it/s]
|
||
|
||
Processing model.layers.1.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.1.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s][A
|
||
|
||
Processing model.layers.1.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s][A
|
||
|
||
[A
|
||
Merging models: 19%|█▊ | 54/291 [00:13<01:05, 3.64it/s]
|
||
|
||
Processing model.layers.19.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.19.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.83it/s][A
|
||
|
||
[A
|
||
Merging models: 19%|█▉ | 55/291 [00:13<00:58, 4.04it/s]
|
||
|
||
Processing model.layers.16.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.16.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 19%|█▉ | 56/291 [00:14<00:53, 4.41it/s]
|
||
|
||
Processing model.layers.31.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.31.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.91it/s][A
|
||
|
||
Processing model.layers.31.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.68it/s][A
|
||
|
||
[A
|
||
Merging models: 20%|█▉ | 57/291 [00:14<01:07, 3.49it/s]
|
||
|
||
Processing model.layers.17.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.17.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 10.91it/s][A
|
||
|
||
[A
|
||
Merging models: 20%|█▉ | 58/291 [00:14<00:59, 3.91it/s]
|
||
|
||
Processing model.layers.22.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.22.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.82it/s][A
|
||
|
||
[A
|
||
Merging models: 20%|██ | 59/291 [00:14<00:54, 4.26it/s]
|
||
|
||
Processing model.layers.29.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.29.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 21%|██ | 60/291 [00:15<00:50, 4.59it/s]
|
||
|
||
Processing model.layers.7.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.7.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.17it/s][A
|
||
|
||
Processing model.layers.7.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.10it/s][A
|
||
|
||
[A
|
||
Merging models: 21%|██ | 61/291 [00:15<00:51, 4.44it/s]
|
||
|
||
Processing model.layers.30.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.30.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.80it/s][A
|
||
|
||
Processing model.layers.30.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s][A
|
||
|
||
[A
|
||
Merging models: 21%|██▏ | 62/291 [00:15<01:05, 3.49it/s]
|
||
|
||
Processing model.layers.21.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.21.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s][A
|
||
|
||
[A
|
||
Merging models: 22%|██▏ | 63/291 [00:16<00:58, 3.90it/s]
|
||
|
||
Processing model.layers.26.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.26.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s][A
|
||
|
||
[A
|
||
Merging models: 22%|██▏ | 64/291 [00:16<00:53, 4.24it/s]
|
||
|
||
Processing model.layers.1.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.1.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.11it/s][A
|
||
|
||
Processing model.layers.1.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.57it/s][A
|
||
|
||
[A
|
||
Merging models: 22%|██▏ | 65/291 [00:16<00:55, 4.11it/s]
|
||
|
||
Processing model.layers.24.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.24.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.81it/s][A
|
||
|
||
Processing model.layers.24.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.66it/s][A
|
||
|
||
[A
|
||
Merging models: 23%|██▎ | 66/291 [00:16<01:07, 3.35it/s]
|
||
|
||
Processing model.layers.19.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.19.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 23%|██▎ | 67/291 [00:17<00:58, 3.80it/s]
|
||
|
||
Processing model.layers.25.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.25.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.79it/s][A
|
||
|
||
Processing model.layers.25.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.48it/s][A
|
||
|
||
[A
|
||
Merging models: 23%|██▎ | 68/291 [00:17<00:58, 3.79it/s]
|
||
|
||
Processing model.layers.12.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.12.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.17it/s][A
|
||
|
||
Processing model.layers.12.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.62it/s][A
|
||
|
||
[A
|
||
Merging models: 24%|██▎ | 69/291 [00:17<00:58, 3.81it/s]
|
||
|
||
Processing model.layers.31.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.31.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s][A
|
||
|
||
Processing model.layers.31.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.69it/s][A
|
||
|
||
[A
|
||
Merging models: 24%|██▍ | 70/291 [00:18<01:08, 3.21it/s]
|
||
|
||
Processing model.layers.27.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.27.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.21it/s][A
|
||
|
||
Processing model.layers.27.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.60it/s][A
|
||
|
||
[A
|
||
Merging models: 24%|██▍ | 71/291 [00:18<01:05, 3.38it/s]
|
||
|
||
Processing model.layers.28.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.28.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s][A
|
||
|
||
[A
|
||
Merging models: 25%|██▍ | 72/291 [00:18<00:57, 3.83it/s]
|
||
|
||
Processing model.layers.17.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.17.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.92it/s][A
|
||
|
||
Processing model.layers.17.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.73it/s][A
|
||
|
||
[A
|
||
Merging models: 25%|██▌ | 73/291 [00:18<01:07, 3.24it/s]
|
||
|
||
Processing model.layers.7.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.7.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.27it/s][A
|
||
|
||
[A
|
||
Merging models: 25%|██▌ | 74/291 [00:19<00:58, 3.71it/s]
|
||
|
||
Processing model.layers.4.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.4.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.00it/s][A
|
||
|
||
Processing model.layers.4.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s][A
|
||
|
||
[A
|
||
Merging models: 26%|██▌ | 75/291 [00:19<01:08, 3.18it/s]
|
||
|
||
Processing model.layers.18.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.18.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 26%|██▌ | 76/291 [00:19<00:58, 3.65it/s]
|
||
|
||
Processing model.layers.10.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.10.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.80it/s][A
|
||
|
||
Processing model.layers.10.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.52it/s][A
|
||
|
||
[A
|
||
Merging models: 26%|██▋ | 77/291 [00:19<00:58, 3.69it/s]
|
||
|
||
Processing model.layers.6.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.6.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.49it/s][A
|
||
|
||
[A
|
||
Merging models: 27%|██▋ | 78/291 [00:20<00:52, 4.04it/s]
|
||
|
||
Processing model.layers.2.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.2.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.89it/s][A
|
||
|
||
Processing model.layers.2.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.61it/s][A
|
||
|
||
[A
|
||
Merging models: 27%|██▋ | 79/291 [00:20<00:53, 3.97it/s]
|
||
|
||
Processing model.layers.22.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.22.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.14it/s][A
|
||
|
||
Processing model.layers.22.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.63it/s][A
|
||
|
||
[A
|
||
Merging models: 27%|██▋ | 80/291 [00:20<00:53, 3.93it/s]
|
||
|
||
Processing model.layers.7.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.7.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.58it/s][A
|
||
|
||
[A
|
||
Merging models: 28%|██▊ | 81/291 [00:20<00:49, 4.26it/s]
|
||
|
||
Processing model.layers.4.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.4.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s][A
|
||
|
||
[A
|
||
Merging models: 28%|██▊ | 82/291 [00:20<00:45, 4.59it/s]
|
||
|
||
Processing model.layers.31.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.31.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.74it/s][A
|
||
|
||
[A
|
||
Merging models: 29%|██▊ | 83/291 [00:21<00:43, 4.80it/s]
|
||
|
||
Processing model.layers.1.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.1.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.12it/s][A
|
||
|
||
Processing model.layers.1.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.91it/s][A
|
||
|
||
[A
|
||
Merging models: 29%|██▉ | 84/291 [00:21<00:55, 3.74it/s]
|
||
|
||
Processing model.layers.29.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.29.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.12it/s][A
|
||
|
||
Processing model.layers.29.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.82it/s][A
|
||
|
||
[A
|
||
Merging models: 29%|██▉ | 85/291 [00:22<01:04, 3.22it/s]
|
||
|
||
Processing model.layers.20.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.20.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.40it/s][A
|
||
|
||
[A
|
||
Merging models: 30%|██▉ | 86/291 [00:22<00:56, 3.63it/s]
|
||
|
||
Processing model.layers.20.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.20.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A
|
||
|
||
[A
|
||
Merging models: 30%|██▉ | 87/291 [00:22<00:50, 4.06it/s]
|
||
|
||
Processing model.layers.22.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.22.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.14it/s][A
|
||
|
||
Processing model.layers.22.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s][A
|
||
|
||
[A
|
||
Merging models: 30%|███ | 88/291 [00:22<01:00, 3.38it/s]
|
||
|
||
Processing model.layers.23.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.23.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s][A
|
||
|
||
[A
|
||
Merging models: 31%|███ | 89/291 [00:22<00:53, 3.80it/s]
|
||
|
||
Processing model.layers.25.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.25.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.58it/s][A
|
||
|
||
[A
|
||
Merging models: 31%|███ | 90/291 [00:23<00:48, 4.15it/s]
|
||
|
||
Processing model.layers.19.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.19.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.14it/s][A
|
||
|
||
Processing model.layers.19.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.86it/s][A
|
||
|
||
[A
|
||
Merging models: 31%|███▏ | 91/291 [00:23<00:58, 3.43it/s]
|
||
|
||
Processing model.layers.6.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.6.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s][A
|
||
|
||
[A
|
||
Merging models: 32%|███▏ | 92/291 [00:23<00:51, 3.89it/s]
|
||
|
||
Processing model.layers.21.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.21.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.99it/s][A
|
||
|
||
Processing model.layers.21.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.68it/s][A
|
||
|
||
[A
|
||
Merging models: 32%|███▏ | 93/291 [00:24<00:51, 3.88it/s]
|
||
|
||
Processing model.layers.0.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.0.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s][A
|
||
|
||
Processing model.layers.0.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.74it/s][A
|
||
|
||
[A
|
||
Merging models: 32%|███▏ | 94/291 [00:24<01:00, 3.26it/s]
|
||
|
||
Processing model.layers.8.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.8.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.60it/s][A
|
||
|
||
[A
|
||
Merging models: 33%|███▎ | 95/291 [00:24<00:53, 3.68it/s]
|
||
|
||
Processing model.layers.0.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.0.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.52it/s][A
|
||
|
||
Processing model.layers.0.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.86it/s][A
|
||
|
||
[A
|
||
Merging models: 33%|███▎ | 96/291 [00:24<00:51, 3.76it/s]
|
||
|
||
Processing model.layers.21.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.21.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s][A
|
||
|
||
[A
|
||
Merging models: 33%|███▎ | 97/291 [00:25<00:46, 4.18it/s]
|
||
|
||
Processing model.layers.5.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.5.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.71it/s][A
|
||
|
||
[A
|
||
Merging models: 34%|███▎ | 98/291 [00:25<00:43, 4.47it/s]
|
||
|
||
Processing model.layers.1.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.1.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.05it/s][A
|
||
|
||
Processing model.layers.1.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.84it/s][A
|
||
|
||
[A
|
||
Merging models: 34%|███▍ | 99/291 [00:25<00:53, 3.57it/s]
|
||
|
||
Processing model.layers.25.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.25.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s][A
|
||
|
||
Processing model.layers.25.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s][A
|
||
|
||
[A
|
||
Merging models: 34%|███▍ | 100/291 [00:26<01:01, 3.13it/s]
|
||
|
||
Processing model.layers.26.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.26.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s][A
|
||
|
||
[A
|
||
Merging models: 35%|███▍ | 101/291 [00:26<00:52, 3.61it/s]
|
||
|
||
Processing model.layers.31.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.31.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.44it/s][A
|
||
|
||
Processing model.layers.31.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.88it/s][A
|
||
|
||
[A
|
||
Merging models: 35%|███▌ | 102/291 [00:26<00:50, 3.71it/s]
|
||
|
||
Processing model.layers.11.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.11.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s][A
|
||
|
||
[A
|
||
Merging models: 35%|███▌ | 103/291 [00:26<00:45, 4.13it/s]
|
||
|
||
Processing model.layers.2.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.2.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.02it/s][A
|
||
|
||
Processing model.layers.2.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.75it/s][A
|
||
|
||
[A
|
||
Merging models: 36%|███▌ | 104/291 [00:26<00:46, 4.05it/s]
|
||
|
||
Processing model.layers.25.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.25.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.84it/s][A
|
||
|
||
[A
|
||
Merging models: 36%|███▌ | 105/291 [00:27<00:44, 4.16it/s]
|
||
|
||
Processing model.layers.13.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.13.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s][A
|
||
|
||
Processing model.layers.13.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.82it/s][A
|
||
|
||
[A
|
||
Merging models: 36%|███▋ | 106/291 [00:27<00:54, 3.42it/s]
|
||
|
||
Processing model.layers.4.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.4.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s][A
|
||
|
||
Processing model.layers.4.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.78it/s][A
|
||
|
||
[A
|
||
Merging models: 37%|███▋ | 107/291 [00:27<01:00, 3.04it/s]
|
||
|
||
Processing model.layers.29.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.29.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.99it/s][A
|
||
|
||
Processing model.layers.29.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.57it/s][A
|
||
|
||
[A
|
||
Merging models: 37%|███▋ | 108/291 [00:28<01:05, 2.78it/s]
|
||
|
||
Processing model.layers.20.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.20.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 10.10it/s][A
|
||
|
||
[A
|
||
Merging models: 37%|███▋ | 109/291 [00:28<00:56, 3.21it/s]
|
||
|
||
Processing model.layers.16.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.16.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.06it/s][A
|
||
|
||
Processing model.layers.16.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s][A
|
||
|
||
[A
|
||
Merging models: 38%|███▊ | 110/291 [00:29<01:02, 2.92it/s]
|
||
|
||
Processing model.layers.23.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.23.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.09it/s][A
|
||
|
||
Processing model.layers.23.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.80it/s][A
|
||
|
||
[A
|
||
Merging models: 38%|███▊ | 111/291 [00:29<01:05, 2.75it/s]
|
||
|
||
Processing model.layers.6.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.6.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.13it/s][A
|
||
|
||
Processing model.layers.6.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s][A
|
||
|
||
[A
|
||
Merging models: 38%|███▊ | 112/291 [00:29<01:07, 2.64it/s]
|
||
|
||
Processing model.layers.29.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.29.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.24it/s][A
|
||
|
||
[A
|
||
Merging models: 39%|███▉ | 113/291 [00:30<00:56, 3.14it/s]
|
||
|
||
Processing model.layers.12.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.12.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.09it/s][A
|
||
|
||
[A
|
||
Merging models: 39%|███▉ | 114/291 [00:30<00:49, 3.61it/s]
|
||
|
||
Processing model.layers.12.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.12.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.46it/s][A
|
||
|
||
[A
|
||
Merging models: 40%|███▉ | 115/291 [00:30<00:44, 3.98it/s]
|
||
|
||
Processing model.layers.18.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.18.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s][A
|
||
|
||
Processing model.layers.18.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s][A
|
||
|
||
[A
|
||
Merging models: 40%|███▉ | 116/291 [00:30<00:52, 3.33it/s]
|
||
|
||
Processing model.layers.26.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.26.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A
|
||
|
||
[A
|
||
Merging models: 40%|████ | 117/291 [00:30<00:45, 3.79it/s]
|
||
|
||
Processing model.embed_tokens.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.embed_tokens.weight: 50%|█████ | 1/2 [00:00<00:00, 2.18it/s][A
|
||
|
||
Processing model.embed_tokens.weight: 100%|██████████| 2/2 [00:00<00:00, 2.58it/s][A
|
||
|
||
[A
|
||
Merging models: 41%|████ | 118/291 [00:31<01:13, 2.36it/s]
|
||
|
||
Processing model.layers.13.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.13.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.73it/s][A
|
||
|
||
[A
|
||
Merging models: 41%|████ | 119/291 [00:31<01:00, 2.83it/s]
|
||
|
||
Processing model.layers.10.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.10.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.16it/s][A
|
||
|
||
Processing model.layers.10.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.87it/s][A
|
||
|
||
[A
|
||
Merging models: 41%|████ | 120/291 [00:32<01:03, 2.71it/s]
|
||
|
||
Processing model.layers.0.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.0.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.42it/s][A
|
||
|
||
Processing model.layers.0.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.78it/s][A
|
||
|
||
[A
|
||
Merging models: 42%|████▏ | 121/291 [00:32<00:56, 2.99it/s]
|
||
|
||
Processing model.layers.29.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.29.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.60it/s][A
|
||
|
||
Processing model.layers.29.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s][A
|
||
|
||
[A
|
||
Merging models: 42%|████▏ | 122/291 [00:33<01:01, 2.75it/s]
|
||
|
||
Processing model.layers.30.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.30.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s][A
|
||
|
||
Processing model.layers.30.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s][A
|
||
|
||
[A
|
||
Merging models: 42%|████▏ | 123/291 [00:33<01:03, 2.63it/s]
|
||
|
||
Processing model.layers.12.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.12.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s][A
|
||
|
||
Processing model.layers.12.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s][A
|
||
|
||
[A
|
||
Merging models: 43%|████▎ | 124/291 [00:33<01:05, 2.55it/s]
|
||
|
||
Processing model.layers.27.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.27.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.06it/s][A
|
||
|
||
[A
|
||
Merging models: 43%|████▎ | 125/291 [00:34<00:54, 3.04it/s]
|
||
|
||
Processing model.layers.22.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.22.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.06it/s][A
|
||
|
||
Processing model.layers.22.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.59it/s][A
|
||
|
||
[A
|
||
Merging models: 43%|████▎ | 126/291 [00:34<00:59, 2.79it/s]
|
||
|
||
Processing model.layers.28.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.28.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s][A
|
||
|
||
Processing model.layers.28.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s][A
|
||
|
||
[A
|
||
Merging models: 44%|████▎ | 127/291 [00:34<01:01, 2.66it/s]
|
||
|
||
Processing model.layers.29.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.29.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.64it/s][A
|
||
|
||
[A
|
||
Merging models: 44%|████▍ | 128/291 [00:35<00:52, 3.13it/s]
|
||
|
||
Processing model.layers.0.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.0.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.05it/s][A
|
||
|
||
Processing model.layers.0.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s][A
|
||
|
||
[A
|
||
Merging models: 44%|████▍ | 129/291 [00:35<00:56, 2.87it/s]
|
||
|
||
Processing model.layers.27.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.27.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.09it/s][A
|
||
|
||
Processing model.layers.27.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s][A
|
||
|
||
[A
|
||
Merging models: 45%|████▍ | 130/291 [00:35<00:59, 2.71it/s]
|
||
|
||
Processing model.layers.9.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.9.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s][A
|
||
|
||
[A
|
||
Merging models: 45%|████▌ | 131/291 [00:36<00:50, 3.18it/s]
|
||
|
||
Processing model.layers.17.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.17.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.03it/s][A
|
||
|
||
Processing model.layers.17.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.60it/s][A
|
||
|
||
[A
|
||
Merging models: 45%|████▌ | 132/291 [00:36<00:47, 3.35it/s]
|
||
|
||
Processing model.layers.22.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.22.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.63it/s][A
|
||
|
||
[A
|
||
Merging models: 46%|████▌ | 133/291 [00:36<00:41, 3.77it/s]
|
||
|
||
Processing model.layers.9.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.9.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A
|
||
|
||
[A
|
||
Merging models: 46%|████▌ | 134/291 [00:36<00:37, 4.17it/s]
|
||
|
||
Processing model.layers.16.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.16.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.06it/s][A
|
||
|
||
Processing model.layers.16.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.83it/s][A
|
||
|
||
[A
|
||
Merging models: 46%|████▋ | 135/291 [00:37<00:45, 3.43it/s]
|
||
|
||
Processing model.layers.21.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.21.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s][A
|
||
|
||
[A
|
||
Merging models: 47%|████▋ | 136/291 [00:37<00:40, 3.85it/s]
|
||
|
||
Processing model.layers.8.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.8.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.02it/s][A
|
||
|
||
Processing model.layers.8.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.62it/s][A
|
||
|
||
[A
|
||
Merging models: 47%|████▋ | 137/291 [00:37<00:40, 3.84it/s]
|
||
|
||
Processing model.layers.31.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.31.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.91it/s][A
|
||
|
||
Processing model.layers.31.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s][A
|
||
|
||
[A
|
||
Merging models: 47%|████▋ | 138/291 [00:38<00:47, 3.25it/s]
|
||
|
||
Processing model.layers.7.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.7.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.11it/s][A
|
||
|
||
Processing model.layers.7.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.82it/s][A
|
||
|
||
[A
|
||
Merging models: 48%|████▊ | 139/291 [00:38<00:51, 2.95it/s]
|
||
|
||
Processing model.layers.15.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.15.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.12it/s][A
|
||
|
||
Processing model.layers.15.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s][A
|
||
|
||
[A
|
||
Merging models: 48%|████▊ | 140/291 [00:38<00:54, 2.77it/s]
|
||
|
||
Processing model.layers.19.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.19.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s][A
|
||
|
||
Processing model.layers.19.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s][A
|
||
|
||
[A
|
||
Merging models: 48%|████▊ | 141/291 [00:39<00:56, 2.65it/s]
|
||
|
||
Processing model.layers.20.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.20.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s][A
|
||
|
||
[A
|
||
Merging models: 49%|████▉ | 142/291 [00:39<00:47, 3.11it/s]
|
||
|
||
Processing model.layers.25.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.25.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.21it/s][A
|
||
|
||
[A
|
||
Merging models: 49%|████▉ | 143/291 [00:39<00:41, 3.59it/s]
|
||
|
||
Processing model.layers.26.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.26.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.50it/s][A
|
||
|
||
[A
|
||
Merging models: 49%|████▉ | 144/291 [00:39<00:37, 3.96it/s]
|
||
|
||
Processing model.layers.24.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.24.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s][A
|
||
|
||
Processing model.layers.24.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.78it/s][A
|
||
|
||
[A
|
||
Merging models: 50%|████▉ | 145/291 [00:40<00:43, 3.32it/s]
|
||
|
||
Processing model.layers.1.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.1.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.93it/s][A
|
||
|
||
Processing model.layers.1.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.65it/s][A
|
||
|
||
[A
|
||
Merging models: 50%|█████ | 146/291 [00:40<00:41, 3.46it/s]
|
||
|
||
Processing model.layers.24.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.24.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.96it/s][A
|
||
|
||
Processing model.layers.24.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.64it/s][A
|
||
|
||
[A
|
||
Merging models: 51%|█████ | 147/291 [00:40<00:40, 3.57it/s]
|
||
|
||
Processing model.layers.15.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.15.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.29it/s][A
|
||
|
||
Processing model.layers.15.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.79it/s][A
|
||
|
||
[A
|
||
Merging models: 51%|█████ | 148/291 [00:41<00:39, 3.67it/s]
|
||
|
||
Processing model.layers.2.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.2.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.90it/s][A
|
||
|
||
Processing model.layers.2.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.76it/s][A
|
||
|
||
[A
|
||
Merging models: 51%|█████ | 149/291 [00:41<00:44, 3.16it/s]
|
||
|
||
Processing model.layers.10.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.10.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.60it/s][A
|
||
|
||
[A
|
||
Merging models: 52%|█████▏ | 150/291 [00:41<00:39, 3.59it/s]
|
||
|
||
Processing model.layers.11.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.11.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.28it/s][A
|
||
|
||
[A
|
||
Merging models: 52%|█████▏ | 151/291 [00:41<00:34, 4.03it/s]
|
||
|
||
Processing model.layers.31.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.31.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.39it/s][A
|
||
|
||
Processing model.layers.31.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.81it/s][A
|
||
|
||
[A
|
||
Merging models: 52%|█████▏ | 152/291 [00:42<00:34, 4.00it/s]
|
||
|
||
Processing model.layers.11.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.11.self_attn.v_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.89it/s][A
|
||
|
||
Processing model.layers.11.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.89it/s][A
|
||
|
||
[A
|
||
Merging models: 53%|█████▎ | 153/291 [00:42<00:34, 3.98it/s]
|
||
|
||
Processing model.layers.23.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.23.self_attn.v_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.12it/s][A
|
||
|
||
Processing model.layers.23.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.37it/s][A
|
||
|
||
[A
|
||
Merging models: 53%|█████▎ | 154/291 [00:42<00:33, 4.03it/s]
|
||
|
||
Processing model.layers.12.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.12.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.96it/s][A
|
||
|
||
Processing model.layers.12.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.74it/s][A
|
||
|
||
[A
|
||
Merging models: 53%|█████▎ | 155/291 [00:42<00:40, 3.34it/s]
|
||
|
||
Processing model.layers.22.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.22.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s][A
|
||
|
||
Processing model.layers.22.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.67it/s][A
|
||
|
||
[A
|
||
Merging models: 54%|█████▎ | 156/291 [00:43<00:45, 2.97it/s]
|
||
|
||
Processing model.layers.30.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.30.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.02it/s][A
|
||
|
||
[A
|
||
Merging models: 54%|█████▍ | 157/291 [00:43<00:39, 3.38it/s]
|
||
|
||
Processing model.layers.2.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.2.self_attn.k_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.64it/s][A
|
||
|
||
Processing model.layers.2.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.93it/s][A
|
||
|
||
[A
|
||
Merging models: 54%|█████▍ | 158/291 [00:43<00:36, 3.65it/s]
|
||
|
||
Processing model.layers.8.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.8.input_layernorm.weight: 50%|█████ | 1/2 [00:00<00:00, 8.69it/s][A
|
||
|
||
Processing model.layers.8.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 8.62it/s][A
|
||
|
||
[A
|
||
Merging models: 55%|█████▍ | 159/291 [00:44<00:34, 3.83it/s]
|
||
|
||
Processing model.layers.12.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.12.post_attention_layernorm.weight: 50%|█████ | 1/2 [00:00<00:00, 8.66it/s][A
|
||
|
||
Processing model.layers.12.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 8.63it/s][A
|
||
|
||
[A
|
||
Merging models: 55%|█████▍ | 160/291 [00:44<00:33, 3.96it/s]
|
||
|
||
Processing model.layers.17.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.17.input_layernorm.weight: 50%|█████ | 1/2 [00:00<00:00, 8.67it/s][A
|
||
|
||
Processing model.layers.17.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 8.65it/s][A
|
||
|
||
[A
|
||
Merging models: 55%|█████▌ | 161/291 [00:44<00:32, 4.06it/s]
|
||
|
||
Processing model.layers.8.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.8.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.11it/s][A
|
||
|
||
Processing model.layers.8.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.85it/s][A
|
||
|
||
[A
|
||
Merging models: 56%|█████▌ | 162/291 [00:44<00:38, 3.39it/s]
|
||
|
||
Processing model.layers.21.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.21.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A
|
||
|
||
[A
|
||
Merging models: 56%|█████▌ | 163/291 [00:45<00:33, 3.84it/s]
|
||
|
||
Processing model.layers.24.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.24.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.21it/s][A
|
||
|
||
[A
|
||
Merging models: 56%|█████▋ | 164/291 [00:45<00:29, 4.24it/s]
|
||
|
||
Processing model.layers.13.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.13.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.09it/s][A
|
||
|
||
Processing model.layers.13.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s][A
|
||
|
||
[A
|
||
Merging models: 57%|█████▋ | 165/291 [00:45<00:36, 3.46it/s]
|
||
|
||
Processing model.layers.2.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.2.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s][A
|
||
|
||
[A
|
||
Merging models: 57%|█████▋ | 166/291 [00:45<00:31, 3.91it/s]
|
||
|
||
Processing model.layers.0.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.0.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.24it/s][A
|
||
|
||
[A
|
||
Merging models: 57%|█████▋ | 167/291 [00:46<00:28, 4.30it/s]
|
||
|
||
Processing lm_head.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing lm_head.weight: 50%|█████ | 1/2 [00:00<00:00, 2.22it/s][A
|
||
|
||
Processing lm_head.weight: 100%|██████████| 2/2 [00:00<00:00, 2.56it/s][A
|
||
|
||
[A
|
||
Merging models: 58%|█████▊ | 168/291 [00:46<00:49, 2.48it/s]
|
||
|
||
Processing model.norm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.norm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 58%|█████▊ | 169/291 [00:47<00:40, 2.98it/s]
|
||
|
||
Processing model.layers.9.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.9.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.23it/s][A
|
||
|
||
Processing model.layers.9.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.89it/s][A
|
||
|
||
[A
|
||
Merging models: 58%|█████▊ | 170/291 [00:47<00:43, 2.80it/s]
|
||
|
||
Processing model.layers.20.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.20.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.70it/s][A
|
||
|
||
Processing model.layers.20.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.48it/s][A
|
||
|
||
[A
|
||
Merging models: 59%|█████▉ | 171/291 [00:47<00:39, 3.03it/s]
|
||
|
||
Processing model.layers.15.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.15.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 59%|█████▉ | 172/291 [00:47<00:33, 3.52it/s]
|
||
|
||
Processing model.layers.31.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.31.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s][A
|
||
|
||
[A
|
||
Merging models: 59%|█████▉ | 173/291 [00:48<00:30, 3.92it/s]
|
||
|
||
Processing model.layers.14.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.14.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A
|
||
|
||
[A
|
||
Merging models: 60%|█████▉ | 174/291 [00:48<00:27, 4.31it/s]
|
||
|
||
Processing model.layers.6.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.6.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.03it/s][A
|
||
|
||
Processing model.layers.6.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.80it/s][A
|
||
|
||
[A
|
||
Merging models: 60%|██████ | 175/291 [00:48<00:33, 3.49it/s]
|
||
|
||
Processing model.layers.9.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.9.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.29it/s][A
|
||
|
||
[A
|
||
Merging models: 60%|██████ | 176/291 [00:48<00:29, 3.94it/s]
|
||
|
||
Processing model.layers.27.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.27.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s][A
|
||
|
||
[A
|
||
Merging models: 61%|██████ | 177/291 [00:49<00:26, 4.28it/s]
|
||
|
||
Processing model.layers.10.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.10.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.68it/s][A
|
||
|
||
[A
|
||
Merging models: 61%|██████ | 178/291 [00:49<00:24, 4.55it/s]
|
||
|
||
Processing model.layers.11.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.11.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.00it/s][A
|
||
|
||
Processing model.layers.11.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.64it/s][A
|
||
|
||
[A
|
||
Merging models: 62%|██████▏ | 179/291 [00:49<00:25, 4.31it/s]
|
||
|
||
Processing model.layers.11.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.11.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.96it/s][A
|
||
|
||
Processing model.layers.11.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s][A
|
||
|
||
[A
|
||
Merging models: 62%|██████▏ | 180/291 [00:49<00:31, 3.48it/s]
|
||
|
||
Processing model.layers.15.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.15.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.93it/s][A
|
||
|
||
Processing model.layers.15.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.59it/s][A
|
||
|
||
[A
|
||
Merging models: 62%|██████▏ | 181/291 [00:50<00:30, 3.58it/s]
|
||
|
||
Processing model.layers.15.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.15.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.60it/s][A
|
||
|
||
[A
|
||
Merging models: 63%|██████▎ | 182/291 [00:50<00:27, 3.96it/s]
|
||
|
||
Processing model.layers.31.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.31.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s][A
|
||
|
||
[A
|
||
Merging models: 63%|██████▎ | 183/291 [00:50<00:24, 4.34it/s]
|
||
|
||
Processing model.layers.7.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.7.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.38it/s][A
|
||
|
||
Processing model.layers.7.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.76it/s][A
|
||
|
||
[A
|
||
Merging models: 63%|██████▎ | 184/291 [00:50<00:25, 4.21it/s]
|
||
|
||
Processing model.layers.5.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.5.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.89it/s][A
|
||
|
||
Processing model.layers.5.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.68it/s][A
|
||
|
||
[A
|
||
Merging models: 64%|██████▎ | 185/291 [00:51<00:25, 4.09it/s]
|
||
|
||
Processing model.layers.28.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.28.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A
|
||
|
||
[A
|
||
Merging models: 64%|██████▍ | 186/291 [00:51<00:23, 4.45it/s]
|
||
|
||
Processing model.layers.20.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.20.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s][A
|
||
|
||
Processing model.layers.20.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.80it/s][A
|
||
|
||
[A
|
||
Merging models: 64%|██████▍ | 187/291 [00:51<00:29, 3.55it/s]
|
||
|
||
Processing model.layers.16.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.16.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.04it/s][A
|
||
|
||
Processing model.layers.16.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s][A
|
||
|
||
[A
|
||
Merging models: 65%|██████▍ | 188/291 [00:52<00:33, 3.10it/s]
|
||
|
||
Processing model.layers.21.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.21.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.83it/s][A
|
||
|
||
Processing model.layers.21.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.51it/s][A
|
||
|
||
[A
|
||
Merging models: 65%|██████▍ | 189/291 [00:52<00:31, 3.28it/s]
|
||
|
||
Processing model.layers.2.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.2.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s][A
|
||
|
||
Processing model.layers.2.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.69it/s][A
|
||
|
||
[A
|
||
Merging models: 65%|██████▌ | 190/291 [00:52<00:34, 2.93it/s]
|
||
|
||
Processing model.layers.3.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.3.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.81it/s][A
|
||
|
||
[A
|
||
Merging models: 66%|██████▌ | 191/291 [00:52<00:29, 3.40it/s]
|
||
|
||
Processing model.layers.14.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.14.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.66it/s][A
|
||
|
||
[A
|
||
Merging models: 66%|██████▌ | 192/291 [00:53<00:25, 3.81it/s]
|
||
|
||
Processing model.layers.0.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.0.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.54it/s][A
|
||
|
||
[A
|
||
Merging models: 66%|██████▋ | 193/291 [00:53<00:23, 4.15it/s]
|
||
|
||
Processing model.layers.24.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.24.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s][A
|
||
|
||
[A
|
||
Merging models: 67%|██████▋ | 194/291 [00:53<00:21, 4.50it/s]
|
||
|
||
Processing model.layers.23.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.23.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.11it/s][A
|
||
|
||
Processing model.layers.23.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s][A
|
||
|
||
[A
|
||
Merging models: 67%|██████▋ | 195/291 [00:53<00:26, 3.58it/s]
|
||
|
||
Processing model.layers.14.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.14.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s][A
|
||
|
||
[A
|
||
Merging models: 67%|██████▋ | 196/291 [00:54<00:23, 3.96it/s]
|
||
|
||
Processing model.layers.6.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.6.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.31it/s][A
|
||
|
||
Processing model.layers.6.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.71it/s][A
|
||
|
||
[A
|
||
Merging models: 68%|██████▊ | 197/291 [00:54<00:23, 3.94it/s]
|
||
|
||
Processing model.layers.13.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.13.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.90it/s][A
|
||
|
||
Processing model.layers.13.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.70it/s][A
|
||
|
||
[A
|
||
Merging models: 68%|██████▊ | 198/291 [00:54<00:28, 3.28it/s]
|
||
|
||
Processing model.layers.27.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.27.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.34it/s][A
|
||
|
||
Processing model.layers.27.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.71it/s][A
|
||
|
||
[A
|
||
Merging models: 68%|██████▊ | 199/291 [00:55<00:26, 3.44it/s]
|
||
|
||
Processing model.layers.29.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.29.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.32it/s][A
|
||
|
||
Processing model.layers.29.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.67it/s][A
|
||
|
||
[A
|
||
Merging models: 69%|██████▊ | 200/291 [00:55<00:25, 3.56it/s]
|
||
|
||
Processing model.layers.16.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.16.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.84it/s][A
|
||
|
||
Processing model.layers.16.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.53it/s][A
|
||
|
||
[A
|
||
Merging models: 69%|██████▉ | 201/291 [00:55<00:24, 3.63it/s]
|
||
|
||
Processing model.layers.27.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.27.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.56it/s][A
|
||
|
||
[A
|
||
Merging models: 69%|██████▉ | 202/291 [00:55<00:22, 4.00it/s]
|
||
|
||
Processing model.layers.14.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.14.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.87it/s][A
|
||
|
||
Processing model.layers.14.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.57it/s][A
|
||
|
||
[A
|
||
Merging models: 70%|██████▉ | 203/291 [00:56<00:22, 3.94it/s]
|
||
|
||
Processing model.layers.30.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.30.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s][A
|
||
|
||
[A
|
||
Merging models: 70%|███████ | 204/291 [00:56<00:20, 4.32it/s]
|
||
|
||
Processing model.layers.3.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.3.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s][A
|
||
|
||
[A
|
||
Merging models: 70%|███████ | 205/291 [00:56<00:18, 4.63it/s]
|
||
|
||
Processing model.layers.30.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.30.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.42it/s][A
|
||
|
||
Processing model.layers.30.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.71it/s][A
|
||
|
||
[A
|
||
Merging models: 71%|███████ | 206/291 [00:56<00:19, 4.38it/s]
|
||
|
||
Processing model.layers.23.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.23.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s][A
|
||
|
||
Processing model.layers.23.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.73it/s][A
|
||
|
||
[A
|
||
Merging models: 71%|███████ | 207/291 [00:57<00:24, 3.49it/s]
|
||
|
||
Processing model.layers.17.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.17.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.31it/s][A
|
||
|
||
Processing model.layers.17.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.69it/s][A
|
||
|
||
[A
|
||
Merging models: 71%|███████▏ | 208/291 [00:57<00:23, 3.60it/s]
|
||
|
||
Processing model.layers.8.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.8.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.91it/s][A
|
||
|
||
Processing model.layers.8.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.71it/s][A
|
||
|
||
[A
|
||
Merging models: 72%|███████▏ | 209/291 [00:57<00:26, 3.11it/s]
|
||
|
||
Processing model.layers.17.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.17.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.66it/s][A
|
||
|
||
[A
|
||
Merging models: 72%|███████▏ | 210/291 [00:57<00:22, 3.56it/s]
|
||
|
||
Processing model.layers.1.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.1.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A
|
||
|
||
[A
|
||
Merging models: 73%|███████▎ | 211/291 [00:58<00:20, 3.99it/s]
|
||
|
||
Processing model.layers.11.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.11.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.46it/s][A
|
||
|
||
[A
|
||
Merging models: 73%|███████▎ | 212/291 [00:58<00:18, 4.29it/s]
|
||
|
||
Processing model.layers.28.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.28.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.58it/s][A
|
||
|
||
[A
|
||
Merging models: 73%|███████▎ | 213/291 [00:58<00:17, 4.55it/s]
|
||
|
||
Processing model.layers.4.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.4.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A
|
||
|
||
[A
|
||
Merging models: 74%|███████▎ | 214/291 [00:58<00:15, 4.82it/s]
|
||
|
||
Processing model.layers.21.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.21.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s][A
|
||
|
||
Processing model.layers.21.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s][A
|
||
|
||
[A
|
||
Merging models: 74%|███████▍ | 215/291 [00:59<00:20, 3.71it/s]
|
||
|
||
Processing model.layers.30.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.30.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.50it/s][A
|
||
|
||
[A
|
||
Merging models: 74%|███████▍ | 216/291 [00:59<00:18, 4.06it/s]
|
||
|
||
Processing model.layers.3.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.3.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.14it/s][A
|
||
|
||
[A
|
||
Merging models: 75%|███████▍ | 217/291 [00:59<00:16, 4.42it/s]
|
||
|
||
Processing model.layers.19.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.19.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.18it/s][A
|
||
|
||
[A
|
||
Merging models: 75%|███████▍ | 218/291 [00:59<00:15, 4.71it/s]
|
||
|
||
Processing model.layers.9.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.9.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.82it/s][A
|
||
|
||
Processing model.layers.9.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.51it/s][A
|
||
|
||
[A
|
||
Merging models: 75%|███████▌ | 219/291 [00:59<00:16, 4.38it/s]
|
||
|
||
Processing model.layers.9.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.9.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.90it/s][A
|
||
|
||
Processing model.layers.9.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s][A
|
||
|
||
[A
|
||
Merging models: 76%|███████▌ | 220/291 [01:00<00:20, 3.51it/s]
|
||
|
||
Processing model.layers.30.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.30.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.04it/s][A
|
||
|
||
Processing model.layers.30.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.73it/s][A
|
||
|
||
[A
|
||
Merging models: 76%|███████▌ | 221/291 [01:00<00:22, 3.07it/s]
|
||
|
||
Processing model.layers.6.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.6.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.17it/s][A
|
||
|
||
Processing model.layers.6.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.60it/s][A
|
||
|
||
[A
|
||
Merging models: 76%|███████▋ | 222/291 [01:00<00:21, 3.27it/s]
|
||
|
||
Processing model.layers.27.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.27.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s][A
|
||
|
||
Processing model.layers.27.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.70it/s][A
|
||
|
||
[A
|
||
Merging models: 77%|███████▋ | 223/291 [01:01<00:23, 2.93it/s]
|
||
|
||
Processing model.layers.17.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.17.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.03it/s][A
|
||
|
||
Processing model.layers.17.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s][A
|
||
|
||
[A
|
||
Merging models: 77%|███████▋ | 224/291 [01:01<00:24, 2.74it/s]
|
||
|
||
Processing model.layers.26.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.26.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.97it/s][A
|
||
|
||
Processing model.layers.26.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s][A
|
||
|
||
[A
|
||
Merging models: 77%|███████▋ | 225/291 [01:02<00:25, 2.62it/s]
|
||
|
||
Processing model.layers.16.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.16.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.18it/s][A
|
||
|
||
[A
|
||
Merging models: 78%|███████▊ | 226/291 [01:02<00:20, 3.12it/s]
|
||
|
||
Processing model.layers.10.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.10.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s][A
|
||
|
||
Processing model.layers.10.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.70it/s][A
|
||
|
||
[A
|
||
Merging models: 78%|███████▊ | 227/291 [01:02<00:22, 2.85it/s]
|
||
|
||
Processing model.layers.23.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.23.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.72it/s][A
|
||
|
||
Processing model.layers.23.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.37it/s][A
|
||
|
||
[A
|
||
Merging models: 78%|███████▊ | 228/291 [01:03<00:20, 3.06it/s]
|
||
|
||
Processing model.layers.3.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.3.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.81it/s][A
|
||
|
||
Processing model.layers.3.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.62it/s][A
|
||
|
||
[A
|
||
Merging models: 79%|███████▊ | 229/291 [01:03<00:22, 2.79it/s]
|
||
|
||
Processing model.layers.24.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.24.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s][A
|
||
|
||
Processing model.layers.24.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.67it/s][A
|
||
|
||
[A
|
||
Merging models: 79%|███████▉ | 230/291 [01:03<00:23, 2.65it/s]
|
||
|
||
Processing model.layers.14.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.14.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.19it/s][A
|
||
|
||
[A
|
||
Merging models: 79%|███████▉ | 231/291 [01:04<00:19, 3.14it/s]
|
||
|
||
Processing model.layers.23.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.23.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.12it/s][A
|
||
|
||
[A
|
||
Merging models: 80%|███████▉ | 232/291 [01:04<00:16, 3.61it/s]
|
||
|
||
Processing model.layers.10.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.10.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A
|
||
|
||
[A
|
||
Merging models: 80%|████████ | 233/291 [01:04<00:14, 4.04it/s]
|
||
|
||
Processing model.layers.11.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.11.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.72it/s][A
|
||
|
||
Processing model.layers.11.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.38it/s][A
|
||
|
||
[A
|
||
Merging models: 80%|████████ | 234/291 [01:04<00:14, 3.94it/s]
|
||
|
||
Processing model.layers.7.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.7.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.15it/s][A
|
||
|
||
[A
|
||
Merging models: 81%|████████ | 235/291 [01:04<00:12, 4.31it/s]
|
||
|
||
Processing model.layers.14.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.14.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.84it/s][A
|
||
|
||
Processing model.layers.14.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.65it/s][A
|
||
|
||
[A
|
||
Merging models: 81%|████████ | 236/291 [01:05<00:15, 3.44it/s]
|
||
|
||
Processing model.layers.5.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.5.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.52it/s][A
|
||
|
||
[A
|
||
Merging models: 81%|████████▏ | 237/291 [01:05<00:14, 3.84it/s]
|
||
|
||
Processing model.layers.14.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.14.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.05it/s][A
|
||
|
||
Processing model.layers.14.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s][A
|
||
|
||
[A
|
||
Merging models: 82%|████████▏ | 238/291 [01:06<00:16, 3.24it/s]
|
||
|
||
Processing model.layers.0.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.0.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.18it/s][A
|
||
|
||
[A
|
||
Merging models: 82%|████████▏ | 239/291 [01:06<00:14, 3.71it/s]
|
||
|
||
Processing model.layers.17.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.17.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s][A
|
||
|
||
Processing model.layers.17.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.69it/s][A
|
||
|
||
[A
|
||
Merging models: 82%|████████▏ | 240/291 [01:06<00:16, 3.17it/s]
|
||
|
||
Processing model.layers.25.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.25.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.99it/s][A
|
||
|
||
Processing model.layers.25.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.68it/s][A
|
||
|
||
[A
|
||
Merging models: 83%|████████▎ | 241/291 [01:07<00:17, 2.87it/s]
|
||
|
||
Processing model.layers.5.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.5.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.96it/s][A
|
||
|
||
Processing model.layers.5.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.67it/s][A
|
||
|
||
[A
|
||
Merging models: 83%|████████▎ | 242/291 [01:07<00:18, 2.69it/s]
|
||
|
||
Processing model.layers.18.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.18.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.96it/s][A
|
||
|
||
Processing model.layers.18.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.67it/s][A
|
||
|
||
[A
|
||
Merging models: 84%|████████▎ | 243/291 [01:07<00:18, 2.58it/s]
|
||
|
||
Processing model.layers.22.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.22.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.76it/s][A
|
||
|
||
Processing model.layers.22.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.37it/s][A
|
||
|
||
[A
|
||
Merging models: 84%|████████▍ | 244/291 [01:08<00:16, 2.84it/s]
|
||
|
||
Processing model.layers.10.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.10.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 10.99it/s][A
|
||
|
||
[A
|
||
Merging models: 84%|████████▍ | 245/291 [01:08<00:13, 3.32it/s]
|
||
|
||
Processing model.layers.19.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.19.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.84it/s][A
|
||
|
||
Processing model.layers.19.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s][A
|
||
|
||
[A
|
||
Merging models: 85%|████████▍ | 246/291 [01:08<00:15, 2.95it/s]
|
||
|
||
Processing model.layers.21.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.21.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.86it/s][A
|
||
|
||
Processing model.layers.21.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.57it/s][A
|
||
|
||
[A
|
||
Merging models: 85%|████████▍ | 247/291 [01:09<00:16, 2.72it/s]
|
||
|
||
Processing model.layers.19.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.19.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.10it/s][A
|
||
|
||
Processing model.layers.19.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.45it/s][A
|
||
|
||
[A
|
||
Merging models: 85%|████████▌ | 248/291 [01:09<00:14, 2.97it/s]
|
||
|
||
Processing model.layers.1.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.1.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.45it/s][A
|
||
|
||
[A
|
||
Merging models: 86%|████████▌ | 249/291 [01:09<00:12, 3.41it/s]
|
||
|
||
Processing model.layers.9.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.9.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.36it/s][A
|
||
|
||
[A
|
||
Merging models: 86%|████████▌ | 250/291 [01:09<00:10, 3.79it/s]
|
||
|
||
Processing model.layers.24.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.24.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.45it/s][A
|
||
|
||
[A
|
||
Merging models: 86%|████████▋ | 251/291 [01:10<00:09, 4.13it/s]
|
||
|
||
Processing model.layers.8.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.8.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s][A
|
||
|
||
Processing model.layers.8.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s][A
|
||
|
||
[A
|
||
Merging models: 87%|████████▋ | 252/291 [01:10<00:11, 3.39it/s]
|
||
|
||
Processing model.layers.8.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.8.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.14it/s][A
|
||
|
||
[A
|
||
Merging models: 87%|████████▋ | 253/291 [01:10<00:09, 3.84it/s]
|
||
|
||
Processing model.layers.28.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.28.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.57it/s][A
|
||
|
||
Processing model.layers.28.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.28it/s][A
|
||
|
||
[A
|
||
Merging models: 87%|████████▋ | 254/291 [01:10<00:09, 3.78it/s]
|
||
|
||
Processing model.layers.1.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.1.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.52it/s][A
|
||
|
||
[A
|
||
Merging models: 88%|████████▊ | 255/291 [01:11<00:08, 4.03it/s]
|
||
|
||
Processing model.layers.13.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.13.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.71it/s][A
|
||
|
||
Processing model.layers.13.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.39it/s][A
|
||
|
||
[A
|
||
Merging models: 88%|████████▊ | 256/291 [01:11<00:08, 3.93it/s]
|
||
|
||
Processing model.layers.9.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.9.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.58it/s][A
|
||
|
||
Processing model.layers.9.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.32it/s][A
|
||
|
||
[A
|
||
Merging models: 88%|████████▊ | 257/291 [01:11<00:08, 3.85it/s]
|
||
|
||
Processing model.layers.5.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.5.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 3.27it/s][A
|
||
|
||
Processing model.layers.5.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 3.87it/s][A
|
||
|
||
[A
|
||
Merging models: 89%|████████▊ | 258/291 [01:12<00:11, 2.93it/s]
|
||
|
||
Processing model.layers.30.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.30.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.66it/s][A
|
||
|
||
Processing model.layers.30.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.35it/s][A
|
||
|
||
[A
|
||
Merging models: 89%|████████▉ | 259/291 [01:12<00:10, 3.12it/s]
|
||
|
||
Processing model.layers.5.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.5.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.15it/s][A
|
||
|
||
[A
|
||
Merging models: 89%|████████▉ | 260/291 [01:12<00:08, 3.60it/s]
|
||
|
||
Processing model.layers.18.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.18.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.56it/s][A
|
||
|
||
Processing model.layers.18.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 6.63it/s][A
|
||
|
||
[A
|
||
Merging models: 90%|████████▉ | 261/291 [01:12<00:08, 3.52it/s]
|
||
|
||
Processing model.layers.20.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.20.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.06it/s][A
|
||
|
||
Processing model.layers.20.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.11it/s][A
|
||
|
||
[A
|
||
Merging models: 90%|█████████ | 262/291 [01:13<00:08, 3.53it/s]
|
||
|
||
Processing model.layers.21.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.21.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.76it/s][A
|
||
|
||
Processing model.layers.21.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.55it/s][A
|
||
|
||
[A
|
||
Merging models: 90%|█████████ | 263/291 [01:13<00:09, 3.04it/s]
|
||
|
||
Processing model.layers.22.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.22.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.16it/s][A
|
||
|
||
[A
|
||
Merging models: 91%|█████████ | 264/291 [01:13<00:07, 3.52it/s]
|
||
|
||
Processing model.layers.4.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.4.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.88it/s][A
|
||
|
||
Processing model.layers.4.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 3.50it/s][A
|
||
|
||
[A
|
||
Merging models: 91%|█████████ | 265/291 [01:14<00:09, 2.75it/s]
|
||
|
||
Processing model.layers.18.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.18.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.19it/s][A
|
||
|
||
[A
|
||
Merging models: 91%|█████████▏| 266/291 [01:14<00:07, 3.25it/s]
|
||
|
||
Processing model.layers.22.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.22.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s][A
|
||
|
||
[A
|
||
Merging models: 92%|█████████▏| 267/291 [01:14<00:06, 3.71it/s]
|
||
|
||
Processing model.layers.16.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.16.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.65it/s][A
|
||
|
||
Processing model.layers.16.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.29it/s][A
|
||
|
||
[A
|
||
Merging models: 92%|█████████▏| 268/291 [01:15<00:06, 3.70it/s]
|
||
|
||
Processing model.layers.7.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.7.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.65it/s][A
|
||
|
||
Processing model.layers.7.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.50it/s][A
|
||
|
||
[A
|
||
Merging models: 92%|█████████▏| 269/291 [01:15<00:07, 3.10it/s]
|
||
|
||
Processing model.layers.13.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.13.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.56it/s][A
|
||
|
||
[A
|
||
Merging models: 93%|█████████▎| 270/291 [01:15<00:05, 3.54it/s]
|
||
|
||
Processing model.layers.29.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.29.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.60it/s][A
|
||
|
||
Processing model.layers.29.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.24it/s][A
|
||
|
||
[A
|
||
Merging models: 93%|█████████▎| 271/291 [01:15<00:05, 3.57it/s]
|
||
|
||
Processing model.layers.8.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.8.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.66it/s][A
|
||
|
||
Processing model.layers.8.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.36it/s][A
|
||
|
||
[A
|
||
Merging models: 93%|█████████▎| 272/291 [01:16<00:05, 3.61it/s]
|
||
|
||
Processing model.layers.27.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.27.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.79it/s][A
|
||
|
||
Processing model.layers.27.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.62it/s][A
|
||
|
||
[A
|
||
Merging models: 94%|█████████▍| 273/291 [01:16<00:05, 3.09it/s]
|
||
|
||
Processing model.layers.13.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.13.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s][A
|
||
|
||
[A
|
||
Merging models: 94%|█████████▍| 274/291 [01:16<00:04, 3.57it/s]
|
||
|
||
Processing model.layers.16.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.16.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.54it/s][A
|
||
|
||
[A
|
||
Merging models: 95%|█████████▍| 275/291 [01:16<00:04, 3.95it/s]
|
||
|
||
Processing model.layers.15.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.15.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A
|
||
|
||
[A
|
||
Merging models: 95%|█████████▍| 276/291 [01:17<00:03, 4.33it/s]
|
||
|
||
Processing model.layers.2.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.2.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.95it/s][A
|
||
|
||
Processing model.layers.2.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 3.44it/s][A
|
||
|
||
[A
|
||
Merging models: 95%|█████████▌| 277/291 [01:17<00:04, 3.05it/s]
|
||
|
||
Processing model.layers.3.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.3.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.81it/s][A
|
||
|
||
Processing model.layers.3.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s][A
|
||
|
||
[A
|
||
Merging models: 96%|█████████▌| 278/291 [01:18<00:04, 2.79it/s]
|
||
|
||
Processing model.layers.4.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.4.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.74it/s][A
|
||
|
||
Processing model.layers.4.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.37it/s][A
|
||
|
||
[A
|
||
Merging models: 96%|█████████▌| 279/291 [01:18<00:03, 3.01it/s]
|
||
|
||
Processing model.layers.23.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.23.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.63it/s][A
|
||
|
||
Processing model.layers.23.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.34it/s][A
|
||
|
||
[A
|
||
Merging models: 96%|█████████▌| 280/291 [01:18<00:03, 3.19it/s]
|
||
|
||
Processing model.layers.31.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.31.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s][A
|
||
|
||
[A
|
||
Merging models: 97%|█████████▋| 281/291 [01:18<00:02, 3.66it/s]
|
||
|
||
Processing model.layers.12.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.12.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.81it/s][A
|
||
|
||
Processing model.layers.12.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.60it/s][A
|
||
|
||
[A
|
||
Merging models: 97%|█████████▋| 282/291 [01:19<00:02, 3.12it/s]
|
||
|
||
Processing model.layers.7.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.7.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.89it/s][A
|
||
|
||
Processing model.layers.7.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s][A
|
||
|
||
[A
|
||
Merging models: 97%|█████████▋| 283/291 [01:19<00:02, 2.83it/s]
|
||
|
||
Processing model.layers.9.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.9.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.94it/s][A
|
||
|
||
Processing model.layers.9.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 3.43it/s][A
|
||
|
||
[A
|
||
Merging models: 98%|█████████▊| 284/291 [01:20<00:02, 2.41it/s]
|
||
|
||
Processing model.layers.15.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.15.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.71it/s][A
|
||
|
||
Processing model.layers.15.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.35it/s][A
|
||
|
||
[A
|
||
Merging models: 98%|█████████▊| 285/291 [01:20<00:02, 2.34it/s]
|
||
|
||
Processing model.layers.26.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.26.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.75it/s][A
|
||
|
||
Processing model.layers.26.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 6.51it/s][A
|
||
|
||
[A
|
||
Merging models: 98%|█████████▊| 286/291 [01:21<00:01, 2.55it/s]
|
||
|
||
Processing model.layers.29.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.29.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.51it/s][A
|
||
|
||
[A
|
||
Merging models: 99%|█████████▊| 287/291 [01:21<00:01, 3.01it/s]
|
||
|
||
Processing model.layers.5.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.5.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.60it/s][A
|
||
|
||
Processing model.layers.5.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.29it/s][A
|
||
|
||
[A
|
||
Merging models: 99%|█████████▉| 288/291 [01:21<00:00, 3.18it/s]
|
||
|
||
Processing model.layers.18.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.18.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.02it/s][A
|
||
|
||
Processing model.layers.18.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.48it/s][A
|
||
|
||
[A
|
||
Merging models: 99%|█████████▉| 289/291 [01:21<00:00, 3.34it/s]
|
||
|
||
Processing model.layers.2.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.2.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.27it/s][A
|
||
|
||
[A
|
||
Merging models: 100%|█████████▉| 290/291 [01:21<00:00, 3.73it/s]
|
||
|
||
Processing model.layers.3.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s][A
|
||
|
||
Processing model.layers.3.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.83it/s][A
|
||
|
||
Processing model.layers.3.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.56it/s][A
|
||
|
||
[A
|
||
Merging models: 100%|██████████| 291/291 [01:22<00:00, 3.75it/s]
|
||
Merging models: 100%|██████████| 291/291 [01:22<00:00, 3.54it/s]
|
||
create a temporary file to store mixed weights: /tmp/tmpp5xfq8r2.ckpt
|
||
***weight for each model***:
|
||
/media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/Mistral-7B-Instruct-v0.2 0.5
|
||
/media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/xDAN-L1-Chat-RL-v1 0.5
|
||
|
||
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.embed_tokens.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
|
||
Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:00, 9.42it/s]/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
|
||
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:00<00:00, 9.31it/s]/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.norm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for lm_head.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
|
||
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
|
||
|
||
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 9.45it/s]
|
||
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 9.42it/s]
|
||
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
|
||
Using pad_token, but it is not set yet.
|
||
Using pad_token, but it is not set yet.
|
||
Remove temporary file: /tmp/tmpp5xfq8r2.ckpt
|
||
Remove temporary directory: /tmp/tmpc0jeswnf
|
||
Saving the new model to ./mixed_llm
|
||
MistralForCausalLM(
|
||
(model): MistralModel(
|
||
(embed_tokens): Embedding(32000, 4096)
|
||
(layers): ModuleList(
|
||
(0-31): 32 x MistralDecoderLayer(
|
||
(self_attn): MistralAttention(
|
||
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
|
||
(k_proj): Linear(in_features=4096, out_features=1024, bias=False)
|
||
(v_proj): Linear(in_features=4096, out_features=1024, bias=False)
|
||
(o_proj): Linear(in_features=4096, out_features=4096, bias=False)
|
||
(rotary_emb): MistralRotaryEmbedding()
|
||
)
|
||
(mlp): MistralMLP(
|
||
(gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
|
||
(up_proj): Linear(in_features=4096, out_features=14336, bias=False)
|
||
(down_proj): Linear(in_features=14336, out_features=4096, bias=False)
|
||
(act_fn): SiLUActivation()
|
||
)
|
||
(input_layernorm): MistralRMSNorm()
|
||
(post_attention_layernorm): MistralRMSNorm()
|
||
)
|
||
)
|
||
(norm): MistralRMSNorm()
|
||
)
|
||
(lm_head): Linear(in_features=4096, out_features=32000, bias=False)
|
||
)
|