Files
LMCocktail-Mistral-7B-v1/scripts/merge.log
ModelHub XC 3b76e11ea3 初始化项目,由ModelHub XC社区提供模型
Model: Yhyu13/LMCocktail-Mistral-7B-v1
Source: Original Platform
2026-05-25 07:36:17 +08:00

1667 lines
281 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

create a temporary directory: /tmp/tmpc0jeswnf
loading /media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/Mistral-7B-Instruct-v0.2 -----------------
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:01, 1.64it/s]
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:01<00:00, 1.63it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00, 1.69it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00, 1.67it/s]
loading /media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/xDAN-L1-Chat-RL-v1 -----------------
Loading checkpoint shards: 0%| | 0/33 [00:00<?, ?it/s]
Loading checkpoint shards: 3%|▎ | 1/33 [00:00<00:03, 9.51it/s]
Loading checkpoint shards: 6%|▌ | 2/33 [00:00<00:03, 9.76it/s]
Loading checkpoint shards: 12%|█▏ | 4/33 [00:00<00:02, 9.97it/s]
Loading checkpoint shards: 15%|█▌ | 5/33 [00:00<00:02, 9.93it/s]
Loading checkpoint shards: 21%|██ | 7/33 [00:00<00:02, 10.00it/s]
Loading checkpoint shards: 24%|██▍ | 8/33 [00:00<00:02, 9.97it/s]
Loading checkpoint shards: 27%|██▋ | 9/33 [00:00<00:02, 9.55it/s]
Loading checkpoint shards: 30%|███ | 10/33 [00:01<00:02, 9.30it/s]
Loading checkpoint shards: 33%|███▎ | 11/33 [00:01<00:02, 9.34it/s]
Loading checkpoint shards: 39%|███▉ | 13/33 [00:01<00:02, 9.86it/s]
Loading checkpoint shards: 45%|████▌ | 15/33 [00:01<00:01, 10.14it/s]
Loading checkpoint shards: 52%|█████▏ | 17/33 [00:01<00:01, 10.32it/s]
Loading checkpoint shards: 58%|█████▊ | 19/33 [00:01<00:01, 10.43it/s]
Loading checkpoint shards: 64%|██████▎ | 21/33 [00:02<00:01, 10.50it/s]
Loading checkpoint shards: 70%|██████▉ | 23/33 [00:02<00:00, 10.57it/s]
Loading checkpoint shards: 76%|███████▌ | 25/33 [00:02<00:00, 10.61it/s]
Loading checkpoint shards: 82%|████████▏ | 27/33 [00:02<00:00, 10.66it/s]
Loading checkpoint shards: 88%|████████▊ | 29/33 [00:02<00:00, 10.70it/s]
Loading checkpoint shards: 94%|█████████▍| 31/33 [00:03<00:00, 10.72it/s]
Loading checkpoint shards: 100%|██████████| 33/33 [00:03<00:00, 10.65it/s]
Loading checkpoint shards: 100%|██████████| 33/33 [00:03<00:00, 10.30it/s]
Merging models: 0%| | 0/291 [00:00<?, ?it/s]
Processing model.layers.26.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.26.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.61it/s]
Processing model.layers.26.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.22it/s]

Merging models: 0%| | 1/291 [00:00<01:19, 3.63it/s]
Processing model.layers.5.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.5.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.29it/s]
Processing model.layers.5.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.10it/s]

Merging models: 1%| | 2/291 [00:00<01:20, 3.59it/s]
Processing model.layers.16.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.16.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.78it/s]

Merging models: 1%| | 3/291 [00:00<01:08, 4.23it/s]
Processing model.layers.25.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.25.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.30it/s]
Processing model.layers.25.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 6.75it/s]

Merging models: 1%|▏ | 4/291 [00:01<01:14, 3.86it/s]
Processing model.layers.18.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.18.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.84it/s]

Merging models: 2%|▏ | 5/291 [00:01<01:06, 4.31it/s]
Processing model.layers.12.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.12.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.37it/s]
Processing model.layers.12.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.09it/s]

Merging models: 2%|▏ | 6/291 [00:01<01:04, 4.39it/s]
Processing model.layers.14.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.14.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.32it/s]
Processing model.layers.14.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.10it/s]

Merging models: 2%|▏ | 7/291 [00:01<01:09, 4.08it/s]
Processing model.layers.11.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.11.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.24it/s]
Processing model.layers.11.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.03it/s]

Merging models: 3%|▎ | 8/291 [00:02<01:12, 3.88it/s]
Processing model.layers.26.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.26.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.30it/s]
Processing model.layers.26.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 5.56it/s]

Merging models: 3%|▎ | 9/291 [00:02<01:20, 3.50it/s]
Processing model.layers.5.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.5.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.27it/s]

Merging models: 3%|▎ | 10/291 [00:02<01:10, 3.96it/s]
Processing model.layers.27.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.27.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.27it/s]

Merging models: 4%|▍ | 11/291 [00:02<01:04, 4.36it/s]
Processing model.layers.3.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.3.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.06it/s]
Processing model.layers.3.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s]

Merging models: 4%|▍ | 12/291 [00:03<01:19, 3.49it/s]
Processing model.layers.30.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.30.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s]

Merging models: 4%|▍ | 13/291 [00:03<01:10, 3.94it/s]
Processing model.layers.6.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.6.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.05it/s]
Processing model.layers.6.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.78it/s]

Merging models: 5%|▍ | 14/291 [00:03<01:23, 3.30it/s]
Processing model.layers.13.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.13.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s]

Merging models: 5%|▌ | 15/291 [00:03<01:13, 3.77it/s]
Processing model.layers.28.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.28.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.13it/s]
Processing model.layers.28.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.94it/s]

Merging models: 5%|▌ | 16/291 [00:04<01:09, 3.96it/s]
Processing model.layers.19.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.19.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.74it/s]

Merging models: 6%|▌ | 17/291 [00:04<01:03, 4.30it/s]
Processing model.layers.25.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.25.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 6%|▌ | 18/291 [00:04<00:59, 4.62it/s]
Processing model.layers.20.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.20.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.99it/s]
Processing model.layers.20.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s]

Merging models: 7%|▋ | 19/291 [00:04<01:15, 3.60it/s]
Processing model.layers.3.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.3.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.17it/s]
Processing model.layers.3.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.94it/s]

Merging models: 7%|▋ | 20/291 [00:05<01:10, 3.82it/s]
Processing model.layers.28.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.28.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s]
Processing model.layers.28.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s]

Merging models: 7%|▋ | 21/291 [00:05<01:23, 3.24it/s]
Processing model.layers.10.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.10.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.16it/s]
Processing model.layers.10.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.98it/s]

Merging models: 8%|▊ | 22/291 [00:05<01:16, 3.54it/s]
Processing model.layers.20.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.20.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s]
Processing model.layers.20.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s]

Merging models: 8%|▊ | 23/291 [00:06<01:26, 3.10it/s]
Processing model.layers.23.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.23.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 8%|▊ | 24/291 [00:06<01:14, 3.58it/s]
Processing model.layers.8.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.8.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.70it/s]

Merging models: 9%|▊ | 25/291 [00:06<01:06, 3.97it/s]
Processing model.layers.3.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.3.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.73it/s]

Merging models: 9%|▉ | 26/291 [00:06<01:01, 4.31it/s]
Processing model.layers.7.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.7.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s]

Merging models: 9%|▉ | 27/291 [00:06<00:57, 4.56it/s]
Processing model.layers.25.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.25.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.17it/s]
Processing model.layers.25.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.92it/s]

Merging models: 10%|▉ | 28/291 [00:07<00:57, 4.54it/s]
Processing model.layers.26.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.26.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.80it/s]
Processing model.layers.26.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.98it/s]

Merging models: 10%|▉ | 29/291 [00:07<00:58, 4.51it/s]
Processing model.layers.17.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.17.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.68it/s]

Merging models: 10%|█ | 30/291 [00:07<00:55, 4.73it/s]
Processing model.layers.13.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.13.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.34it/s]
Processing model.layers.13.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.04it/s]

Merging models: 11%|█ | 31/291 [00:07<00:55, 4.67it/s]
Processing model.layers.28.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.28.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s]
Processing model.layers.28.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s]

Merging models: 11%|█ | 32/291 [00:08<01:11, 3.63it/s]
Processing model.layers.28.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.28.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.72it/s]

Merging models: 11%|█▏ | 33/291 [00:08<01:04, 4.02it/s]
Processing model.layers.6.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.6.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s]

Merging models: 12%|█▏ | 34/291 [00:08<00:58, 4.39it/s]
Processing model.layers.4.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.4.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.71it/s]

Merging models: 12%|█▏ | 35/291 [00:08<00:55, 4.64it/s]
Processing model.layers.4.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.4.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.32it/s]
Processing model.layers.4.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.28it/s]

Merging models: 12%|█▏ | 36/291 [00:08<00:54, 4.64it/s]
Processing model.layers.15.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.15.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.09it/s]
Processing model.layers.15.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s]

Merging models: 13%|█▎ | 37/291 [00:09<01:10, 3.63it/s]
Processing model.layers.19.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.19.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.29it/s]
Processing model.layers.19.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.00it/s]

Merging models: 13%|█▎ | 38/291 [00:09<01:05, 3.85it/s]
Processing model.layers.18.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.18.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.03it/s]
Processing model.layers.18.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s]

Merging models: 13%|█▎ | 39/291 [00:10<01:17, 3.25it/s]
Processing model.layers.11.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.11.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s]
Processing model.layers.11.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s]

Merging models: 14%|█▎ | 40/291 [00:10<01:25, 2.94it/s]
Processing model.layers.14.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.14.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.15it/s]
Processing model.layers.14.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.09it/s]

Merging models: 14%|█▍ | 41/291 [00:10<01:16, 3.28it/s]
Processing model.layers.6.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.6.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s]

Merging models: 14%|█▍ | 42/291 [00:10<01:07, 3.72it/s]
Processing model.layers.2.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.2.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 15%|█▍ | 43/291 [00:11<01:00, 4.13it/s]
Processing model.layers.18.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.18.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s]

Merging models: 15%|█▌ | 44/291 [00:11<00:55, 4.44it/s]
Processing model.layers.12.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.12.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.63it/s]

Merging models: 15%|█▌ | 45/291 [00:11<00:52, 4.67it/s]
Processing model.layers.0.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.0.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.70it/s]

Merging models: 16%|█▌ | 46/291 [00:11<00:50, 4.85it/s]
Processing model.layers.0.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.0.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s]
Processing model.layers.0.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.71it/s]

Merging models: 16%|█▌ | 47/291 [00:12<01:06, 3.70it/s]
Processing model.layers.10.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.10.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.01it/s]
Processing model.layers.10.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.74it/s]

Merging models: 16%|█▋ | 48/291 [00:12<01:16, 3.17it/s]
Processing model.layers.15.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.15.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.79it/s]

Merging models: 17%|█▋ | 49/291 [00:12<01:06, 3.62it/s]
Processing model.layers.24.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.24.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.74it/s]

Merging models: 17%|█▋ | 50/291 [00:12<01:00, 4.01it/s]
Processing model.layers.24.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.24.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.05it/s]
Processing model.layers.24.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.06it/s]

Merging models: 18%|█▊ | 51/291 [00:13<00:57, 4.15it/s]
Processing model.layers.1.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.1.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 18%|█▊ | 52/291 [00:13<00:53, 4.50it/s]
Processing model.layers.4.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.4.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.73it/s]

Merging models: 18%|█▊ | 53/291 [00:13<00:50, 4.72it/s]
Processing model.layers.1.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.1.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s]
Processing model.layers.1.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s]

Merging models: 19%|█▊ | 54/291 [00:13<01:05, 3.64it/s]
Processing model.layers.19.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.19.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.83it/s]

Merging models: 19%|█▉ | 55/291 [00:13<00:58, 4.04it/s]
Processing model.layers.16.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.16.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 19%|█▉ | 56/291 [00:14<00:53, 4.41it/s]
Processing model.layers.31.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.31.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.91it/s]
Processing model.layers.31.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.68it/s]

Merging models: 20%|█▉ | 57/291 [00:14<01:07, 3.49it/s]
Processing model.layers.17.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.17.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 10.91it/s]

Merging models: 20%|█▉ | 58/291 [00:14<00:59, 3.91it/s]
Processing model.layers.22.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.22.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.82it/s]

Merging models: 20%|██ | 59/291 [00:14<00:54, 4.26it/s]
Processing model.layers.29.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.29.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 21%|██ | 60/291 [00:15<00:50, 4.59it/s]
Processing model.layers.7.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.7.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.17it/s]
Processing model.layers.7.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.10it/s]

Merging models: 21%|██ | 61/291 [00:15<00:51, 4.44it/s]
Processing model.layers.30.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.30.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.80it/s]
Processing model.layers.30.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s]

Merging models: 21%|██▏ | 62/291 [00:15<01:05, 3.49it/s]
Processing model.layers.21.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.21.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s]

Merging models: 22%|██▏ | 63/291 [00:16<00:58, 3.90it/s]
Processing model.layers.26.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.26.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s]

Merging models: 22%|██▏ | 64/291 [00:16<00:53, 4.24it/s]
Processing model.layers.1.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.1.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.11it/s]
Processing model.layers.1.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.57it/s]

Merging models: 22%|██▏ | 65/291 [00:16<00:55, 4.11it/s]
Processing model.layers.24.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.24.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.81it/s]
Processing model.layers.24.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.66it/s]

Merging models: 23%|██▎ | 66/291 [00:16<01:07, 3.35it/s]
Processing model.layers.19.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.19.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 23%|██▎ | 67/291 [00:17<00:58, 3.80it/s]
Processing model.layers.25.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.25.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.79it/s]
Processing model.layers.25.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.48it/s]

Merging models: 23%|██▎ | 68/291 [00:17<00:58, 3.79it/s]
Processing model.layers.12.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.12.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.17it/s]
Processing model.layers.12.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.62it/s]

Merging models: 24%|██▎ | 69/291 [00:17<00:58, 3.81it/s]
Processing model.layers.31.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.31.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s]
Processing model.layers.31.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.69it/s]

Merging models: 24%|██▍ | 70/291 [00:18<01:08, 3.21it/s]
Processing model.layers.27.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.27.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.21it/s]
Processing model.layers.27.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.60it/s]

Merging models: 24%|██▍ | 71/291 [00:18<01:05, 3.38it/s]
Processing model.layers.28.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.28.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s]

Merging models: 25%|██▍ | 72/291 [00:18<00:57, 3.83it/s]
Processing model.layers.17.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.17.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.92it/s]
Processing model.layers.17.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.73it/s]

Merging models: 25%|██▌ | 73/291 [00:18<01:07, 3.24it/s]
Processing model.layers.7.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.7.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.27it/s]

Merging models: 25%|██▌ | 74/291 [00:19<00:58, 3.71it/s]
Processing model.layers.4.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.4.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.00it/s]
Processing model.layers.4.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s]

Merging models: 26%|██▌ | 75/291 [00:19<01:08, 3.18it/s]
Processing model.layers.18.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.18.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 26%|██▌ | 76/291 [00:19<00:58, 3.65it/s]
Processing model.layers.10.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.10.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.80it/s]
Processing model.layers.10.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.52it/s]

Merging models: 26%|██▋ | 77/291 [00:19<00:58, 3.69it/s]
Processing model.layers.6.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.6.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.49it/s]

Merging models: 27%|██▋ | 78/291 [00:20<00:52, 4.04it/s]
Processing model.layers.2.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.2.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.89it/s]
Processing model.layers.2.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.61it/s]

Merging models: 27%|██▋ | 79/291 [00:20<00:53, 3.97it/s]
Processing model.layers.22.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.22.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.14it/s]
Processing model.layers.22.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.63it/s]

Merging models: 27%|██▋ | 80/291 [00:20<00:53, 3.93it/s]
Processing model.layers.7.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.7.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.58it/s]

Merging models: 28%|██▊ | 81/291 [00:20<00:49, 4.26it/s]
Processing model.layers.4.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.4.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s]

Merging models: 28%|██▊ | 82/291 [00:20<00:45, 4.59it/s]
Processing model.layers.31.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.31.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.74it/s]

Merging models: 29%|██▊ | 83/291 [00:21<00:43, 4.80it/s]
Processing model.layers.1.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.1.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.12it/s]
Processing model.layers.1.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.91it/s]

Merging models: 29%|██▉ | 84/291 [00:21<00:55, 3.74it/s]
Processing model.layers.29.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.29.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.12it/s]
Processing model.layers.29.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.82it/s]

Merging models: 29%|██▉ | 85/291 [00:22<01:04, 3.22it/s]
Processing model.layers.20.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.20.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.40it/s]

Merging models: 30%|██▉ | 86/291 [00:22<00:56, 3.63it/s]
Processing model.layers.20.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.20.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s]

Merging models: 30%|██▉ | 87/291 [00:22<00:50, 4.06it/s]
Processing model.layers.22.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.22.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.14it/s]
Processing model.layers.22.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s]

Merging models: 30%|███ | 88/291 [00:22<01:00, 3.38it/s]
Processing model.layers.23.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.23.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.75it/s]

Merging models: 31%|███ | 89/291 [00:22<00:53, 3.80it/s]
Processing model.layers.25.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.25.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.58it/s]

Merging models: 31%|███ | 90/291 [00:23<00:48, 4.15it/s]
Processing model.layers.19.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.19.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.14it/s]
Processing model.layers.19.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.86it/s]

Merging models: 31%|███▏ | 91/291 [00:23<00:58, 3.43it/s]
Processing model.layers.6.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.6.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s]

Merging models: 32%|███▏ | 92/291 [00:23<00:51, 3.89it/s]
Processing model.layers.21.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.21.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.99it/s]
Processing model.layers.21.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.68it/s]

Merging models: 32%|███▏ | 93/291 [00:24<00:51, 3.88it/s]
Processing model.layers.0.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.0.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s]
Processing model.layers.0.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.74it/s]

Merging models: 32%|███▏ | 94/291 [00:24<01:00, 3.26it/s]
Processing model.layers.8.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.8.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.60it/s]

Merging models: 33%|███▎ | 95/291 [00:24<00:53, 3.68it/s]
Processing model.layers.0.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.0.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.52it/s]
Processing model.layers.0.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.86it/s]

Merging models: 33%|███▎ | 96/291 [00:24<00:51, 3.76it/s]
Processing model.layers.21.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.21.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.25it/s]

Merging models: 33%|███▎ | 97/291 [00:25<00:46, 4.18it/s]
Processing model.layers.5.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.5.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.71it/s]

Merging models: 34%|███▎ | 98/291 [00:25<00:43, 4.47it/s]
Processing model.layers.1.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.1.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.05it/s]
Processing model.layers.1.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.84it/s]

Merging models: 34%|███▍ | 99/291 [00:25<00:53, 3.57it/s]
Processing model.layers.25.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.25.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s]
Processing model.layers.25.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s]

Merging models: 34%|███▍ | 100/291 [00:26<01:01, 3.13it/s]
Processing model.layers.26.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.26.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s]

Merging models: 35%|███▍ | 101/291 [00:26<00:52, 3.61it/s]
Processing model.layers.31.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.31.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.44it/s]
Processing model.layers.31.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.88it/s]

Merging models: 35%|███▌ | 102/291 [00:26<00:50, 3.71it/s]
Processing model.layers.11.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.11.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s]

Merging models: 35%|███▌ | 103/291 [00:26<00:45, 4.13it/s]
Processing model.layers.2.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.2.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.02it/s]
Processing model.layers.2.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.75it/s]

Merging models: 36%|███▌ | 104/291 [00:26<00:46, 4.05it/s]
Processing model.layers.25.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.25.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.84it/s]

Merging models: 36%|███▌ | 105/291 [00:27<00:44, 4.16it/s]
Processing model.layers.13.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.13.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s]
Processing model.layers.13.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.82it/s]

Merging models: 36%|███▋ | 106/291 [00:27<00:54, 3.42it/s]
Processing model.layers.4.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.4.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s]
Processing model.layers.4.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.78it/s]

Merging models: 37%|███▋ | 107/291 [00:27<01:00, 3.04it/s]
Processing model.layers.29.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.29.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.99it/s]
Processing model.layers.29.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.57it/s]

Merging models: 37%|███▋ | 108/291 [00:28<01:05, 2.78it/s]
Processing model.layers.20.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.20.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 10.10it/s]

Merging models: 37%|███▋ | 109/291 [00:28<00:56, 3.21it/s]
Processing model.layers.16.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.16.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.06it/s]
Processing model.layers.16.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s]

Merging models: 38%|███▊ | 110/291 [00:29<01:02, 2.92it/s]
Processing model.layers.23.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.23.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.09it/s]
Processing model.layers.23.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.80it/s]

Merging models: 38%|███▊ | 111/291 [00:29<01:05, 2.75it/s]
Processing model.layers.6.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.6.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.13it/s]
Processing model.layers.6.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s]

Merging models: 38%|███▊ | 112/291 [00:29<01:07, 2.64it/s]
Processing model.layers.29.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.29.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.24it/s]

Merging models: 39%|███▉ | 113/291 [00:30<00:56, 3.14it/s]
Processing model.layers.12.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.12.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.09it/s]

Merging models: 39%|███▉ | 114/291 [00:30<00:49, 3.61it/s]
Processing model.layers.12.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.12.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.46it/s]

Merging models: 40%|███▉ | 115/291 [00:30<00:44, 3.98it/s]
Processing model.layers.18.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.18.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s]
Processing model.layers.18.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s]

Merging models: 40%|███▉ | 116/291 [00:30<00:52, 3.33it/s]
Processing model.layers.26.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.26.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s]

Merging models: 40%|████ | 117/291 [00:30<00:45, 3.79it/s]
Processing model.embed_tokens.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.embed_tokens.weight: 50%|█████ | 1/2 [00:00<00:00, 2.18it/s]
Processing model.embed_tokens.weight: 100%|██████████| 2/2 [00:00<00:00, 2.58it/s]

Merging models: 41%|████ | 118/291 [00:31<01:13, 2.36it/s]
Processing model.layers.13.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.13.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.73it/s]

Merging models: 41%|████ | 119/291 [00:31<01:00, 2.83it/s]
Processing model.layers.10.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.10.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.16it/s]
Processing model.layers.10.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.87it/s]

Merging models: 41%|████ | 120/291 [00:32<01:03, 2.71it/s]
Processing model.layers.0.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.0.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.42it/s]
Processing model.layers.0.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.78it/s]

Merging models: 42%|████▏ | 121/291 [00:32<00:56, 2.99it/s]
Processing model.layers.29.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.29.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.60it/s]
Processing model.layers.29.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s]

Merging models: 42%|████▏ | 122/291 [00:33<01:01, 2.75it/s]
Processing model.layers.30.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.30.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s]
Processing model.layers.30.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s]

Merging models: 42%|████▏ | 123/291 [00:33<01:03, 2.63it/s]
Processing model.layers.12.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.12.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s]
Processing model.layers.12.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s]

Merging models: 43%|████▎ | 124/291 [00:33<01:05, 2.55it/s]
Processing model.layers.27.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.27.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.06it/s]

Merging models: 43%|████▎ | 125/291 [00:34<00:54, 3.04it/s]
Processing model.layers.22.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.22.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.06it/s]
Processing model.layers.22.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.59it/s]

Merging models: 43%|████▎ | 126/291 [00:34<00:59, 2.79it/s]
Processing model.layers.28.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.28.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s]
Processing model.layers.28.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s]

Merging models: 44%|████▎ | 127/291 [00:34<01:01, 2.66it/s]
Processing model.layers.29.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.29.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.64it/s]

Merging models: 44%|████▍ | 128/291 [00:35<00:52, 3.13it/s]
Processing model.layers.0.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.0.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.05it/s]
Processing model.layers.0.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s]

Merging models: 44%|████▍ | 129/291 [00:35<00:56, 2.87it/s]
Processing model.layers.27.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.27.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.09it/s]
Processing model.layers.27.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s]

Merging models: 45%|████▍ | 130/291 [00:35<00:59, 2.71it/s]
Processing model.layers.9.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.9.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s]

Merging models: 45%|████▌ | 131/291 [00:36<00:50, 3.18it/s]
Processing model.layers.17.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.17.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.03it/s]
Processing model.layers.17.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.60it/s]

Merging models: 45%|████▌ | 132/291 [00:36<00:47, 3.35it/s]
Processing model.layers.22.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.22.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.63it/s]

Merging models: 46%|████▌ | 133/291 [00:36<00:41, 3.77it/s]
Processing model.layers.9.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.9.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s]

Merging models: 46%|████▌ | 134/291 [00:36<00:37, 4.17it/s]
Processing model.layers.16.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.16.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.06it/s]
Processing model.layers.16.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.83it/s]

Merging models: 46%|████▋ | 135/291 [00:37<00:45, 3.43it/s]
Processing model.layers.21.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.21.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s]

Merging models: 47%|████▋ | 136/291 [00:37<00:40, 3.85it/s]
Processing model.layers.8.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.8.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.02it/s]
Processing model.layers.8.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.62it/s]

Merging models: 47%|████▋ | 137/291 [00:37<00:40, 3.84it/s]
Processing model.layers.31.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.31.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.91it/s]
Processing model.layers.31.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s]

Merging models: 47%|████▋ | 138/291 [00:38<00:47, 3.25it/s]
Processing model.layers.7.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.7.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.11it/s]
Processing model.layers.7.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.82it/s]

Merging models: 48%|████▊ | 139/291 [00:38<00:51, 2.95it/s]
Processing model.layers.15.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.15.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.12it/s]
Processing model.layers.15.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.81it/s]

Merging models: 48%|████▊ | 140/291 [00:38<00:54, 2.77it/s]
Processing model.layers.19.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.19.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s]
Processing model.layers.19.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s]

Merging models: 48%|████▊ | 141/291 [00:39<00:56, 2.65it/s]
Processing model.layers.20.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.20.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s]

Merging models: 49%|████▉ | 142/291 [00:39<00:47, 3.11it/s]
Processing model.layers.25.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.25.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.21it/s]

Merging models: 49%|████▉ | 143/291 [00:39<00:41, 3.59it/s]
Processing model.layers.26.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.26.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.50it/s]

Merging models: 49%|████▉ | 144/291 [00:39<00:37, 3.96it/s]
Processing model.layers.24.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.24.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.07it/s]
Processing model.layers.24.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.78it/s]

Merging models: 50%|████▉ | 145/291 [00:40<00:43, 3.32it/s]
Processing model.layers.1.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.1.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.93it/s]
Processing model.layers.1.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.65it/s]

Merging models: 50%|█████ | 146/291 [00:40<00:41, 3.46it/s]
Processing model.layers.24.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.24.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.96it/s]
Processing model.layers.24.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.64it/s]

Merging models: 51%|█████ | 147/291 [00:40<00:40, 3.57it/s]
Processing model.layers.15.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.15.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.29it/s]
Processing model.layers.15.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.79it/s]

Merging models: 51%|█████ | 148/291 [00:41<00:39, 3.67it/s]
Processing model.layers.2.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.2.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.90it/s]
Processing model.layers.2.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.76it/s]

Merging models: 51%|█████ | 149/291 [00:41<00:44, 3.16it/s]
Processing model.layers.10.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.10.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.60it/s]

Merging models: 52%|█████▏ | 150/291 [00:41<00:39, 3.59it/s]
Processing model.layers.11.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.11.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.28it/s]

Merging models: 52%|█████▏ | 151/291 [00:41<00:34, 4.03it/s]
Processing model.layers.31.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.31.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.39it/s]
Processing model.layers.31.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.81it/s]

Merging models: 52%|█████▏ | 152/291 [00:42<00:34, 4.00it/s]
Processing model.layers.11.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.11.self_attn.v_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.89it/s]
Processing model.layers.11.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.89it/s]

Merging models: 53%|█████▎ | 153/291 [00:42<00:34, 3.98it/s]
Processing model.layers.23.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.23.self_attn.v_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.12it/s]
Processing model.layers.23.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.37it/s]

Merging models: 53%|█████▎ | 154/291 [00:42<00:33, 4.03it/s]
Processing model.layers.12.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.12.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.96it/s]
Processing model.layers.12.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.74it/s]

Merging models: 53%|█████▎ | 155/291 [00:42<00:40, 3.34it/s]
Processing model.layers.22.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.22.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s]
Processing model.layers.22.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.67it/s]

Merging models: 54%|█████▎ | 156/291 [00:43<00:45, 2.97it/s]
Processing model.layers.30.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.30.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.02it/s]

Merging models: 54%|█████▍ | 157/291 [00:43<00:39, 3.38it/s]
Processing model.layers.2.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.2.self_attn.k_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 9.64it/s]
Processing model.layers.2.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 8.93it/s]

Merging models: 54%|█████▍ | 158/291 [00:43<00:36, 3.65it/s]
Processing model.layers.8.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.8.input_layernorm.weight: 50%|█████ | 1/2 [00:00<00:00, 8.69it/s]
Processing model.layers.8.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 8.62it/s]

Merging models: 55%|█████▍ | 159/291 [00:44<00:34, 3.83it/s]
Processing model.layers.12.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.12.post_attention_layernorm.weight: 50%|█████ | 1/2 [00:00<00:00, 8.66it/s]
Processing model.layers.12.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 8.63it/s]

Merging models: 55%|█████▍ | 160/291 [00:44<00:33, 3.96it/s]
Processing model.layers.17.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.17.input_layernorm.weight: 50%|█████ | 1/2 [00:00<00:00, 8.67it/s]
Processing model.layers.17.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 8.65it/s]

Merging models: 55%|█████▌ | 161/291 [00:44<00:32, 4.06it/s]
Processing model.layers.8.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.8.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.11it/s]
Processing model.layers.8.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.85it/s]

Merging models: 56%|█████▌ | 162/291 [00:44<00:38, 3.39it/s]
Processing model.layers.21.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.21.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s]

Merging models: 56%|█████▌ | 163/291 [00:45<00:33, 3.84it/s]
Processing model.layers.24.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.24.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.21it/s]

Merging models: 56%|█████▋ | 164/291 [00:45<00:29, 4.24it/s]
Processing model.layers.13.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.13.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.09it/s]
Processing model.layers.13.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s]

Merging models: 57%|█████▋ | 165/291 [00:45<00:36, 3.46it/s]
Processing model.layers.2.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.2.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.26it/s]

Merging models: 57%|█████▋ | 166/291 [00:45<00:31, 3.91it/s]
Processing model.layers.0.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.0.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.24it/s]

Merging models: 57%|█████▋ | 167/291 [00:46<00:28, 4.30it/s]
Processing lm_head.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing lm_head.weight: 50%|█████ | 1/2 [00:00<00:00, 2.22it/s]
Processing lm_head.weight: 100%|██████████| 2/2 [00:00<00:00, 2.56it/s]

Merging models: 58%|█████▊ | 168/291 [00:46<00:49, 2.48it/s]
Processing model.norm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.norm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 58%|█████▊ | 169/291 [00:47<00:40, 2.98it/s]
Processing model.layers.9.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.9.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.23it/s]
Processing model.layers.9.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.89it/s]

Merging models: 58%|█████▊ | 170/291 [00:47<00:43, 2.80it/s]
Processing model.layers.20.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.20.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.70it/s]
Processing model.layers.20.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.48it/s]

Merging models: 59%|█████▉ | 171/291 [00:47<00:39, 3.03it/s]
Processing model.layers.15.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.15.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 59%|█████▉ | 172/291 [00:47<00:33, 3.52it/s]
Processing model.layers.31.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.31.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s]

Merging models: 59%|█████▉ | 173/291 [00:48<00:30, 3.92it/s]
Processing model.layers.14.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.14.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s]

Merging models: 60%|█████▉ | 174/291 [00:48<00:27, 4.31it/s]
Processing model.layers.6.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.6.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.03it/s]
Processing model.layers.6.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.80it/s]

Merging models: 60%|██████ | 175/291 [00:48<00:33, 3.49it/s]
Processing model.layers.9.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.9.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.29it/s]

Merging models: 60%|██████ | 176/291 [00:48<00:29, 3.94it/s]
Processing model.layers.27.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.27.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.76it/s]

Merging models: 61%|██████ | 177/291 [00:49<00:26, 4.28it/s]
Processing model.layers.10.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.10.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.68it/s]

Merging models: 61%|██████ | 178/291 [00:49<00:24, 4.55it/s]
Processing model.layers.11.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.11.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.00it/s]
Processing model.layers.11.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.64it/s]

Merging models: 62%|██████▏ | 179/291 [00:49<00:25, 4.31it/s]
Processing model.layers.11.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.11.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.96it/s]
Processing model.layers.11.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s]

Merging models: 62%|██████▏ | 180/291 [00:49<00:31, 3.48it/s]
Processing model.layers.15.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.15.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.93it/s]
Processing model.layers.15.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.59it/s]

Merging models: 62%|██████▏ | 181/291 [00:50<00:30, 3.58it/s]
Processing model.layers.15.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.15.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.60it/s]

Merging models: 63%|██████▎ | 182/291 [00:50<00:27, 3.96it/s]
Processing model.layers.31.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.31.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.22it/s]

Merging models: 63%|██████▎ | 183/291 [00:50<00:24, 4.34it/s]
Processing model.layers.7.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.7.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.38it/s]
Processing model.layers.7.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.76it/s]

Merging models: 63%|██████▎ | 184/291 [00:50<00:25, 4.21it/s]
Processing model.layers.5.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.5.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.89it/s]
Processing model.layers.5.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.68it/s]

Merging models: 64%|██████▎ | 185/291 [00:51<00:25, 4.09it/s]
Processing model.layers.28.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.28.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s]

Merging models: 64%|██████▍ | 186/291 [00:51<00:23, 4.45it/s]
Processing model.layers.20.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.20.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s]
Processing model.layers.20.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.80it/s]

Merging models: 64%|██████▍ | 187/291 [00:51<00:29, 3.55it/s]
Processing model.layers.16.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.16.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.04it/s]
Processing model.layers.16.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s]

Merging models: 65%|██████▍ | 188/291 [00:52<00:33, 3.10it/s]
Processing model.layers.21.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.21.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.83it/s]
Processing model.layers.21.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.51it/s]

Merging models: 65%|██████▍ | 189/291 [00:52<00:31, 3.28it/s]
Processing model.layers.2.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.2.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s]
Processing model.layers.2.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.69it/s]

Merging models: 65%|██████▌ | 190/291 [00:52<00:34, 2.93it/s]
Processing model.layers.3.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.3.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.81it/s]

Merging models: 66%|██████▌ | 191/291 [00:52<00:29, 3.40it/s]
Processing model.layers.14.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.14.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.66it/s]

Merging models: 66%|██████▌ | 192/291 [00:53<00:25, 3.81it/s]
Processing model.layers.0.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.0.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.54it/s]

Merging models: 66%|██████▋ | 193/291 [00:53<00:23, 4.15it/s]
Processing model.layers.24.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.24.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.23it/s]

Merging models: 67%|██████▋ | 194/291 [00:53<00:21, 4.50it/s]
Processing model.layers.23.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.23.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.11it/s]
Processing model.layers.23.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s]

Merging models: 67%|██████▋ | 195/291 [00:53<00:26, 3.58it/s]
Processing model.layers.14.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.14.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.62it/s]

Merging models: 67%|██████▋ | 196/291 [00:54<00:23, 3.96it/s]
Processing model.layers.6.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.6.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.31it/s]
Processing model.layers.6.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.71it/s]

Merging models: 68%|██████▊ | 197/291 [00:54<00:23, 3.94it/s]
Processing model.layers.13.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.13.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.90it/s]
Processing model.layers.13.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.70it/s]

Merging models: 68%|██████▊ | 198/291 [00:54<00:28, 3.28it/s]
Processing model.layers.27.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.27.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.34it/s]
Processing model.layers.27.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.71it/s]

Merging models: 68%|██████▊ | 199/291 [00:55<00:26, 3.44it/s]
Processing model.layers.29.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.29.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.32it/s]
Processing model.layers.29.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.67it/s]

Merging models: 69%|██████▊ | 200/291 [00:55<00:25, 3.56it/s]
Processing model.layers.16.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.16.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.84it/s]
Processing model.layers.16.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.53it/s]

Merging models: 69%|██████▉ | 201/291 [00:55<00:24, 3.63it/s]
Processing model.layers.27.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.27.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.56it/s]

Merging models: 69%|██████▉ | 202/291 [00:55<00:22, 4.00it/s]
Processing model.layers.14.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.14.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.87it/s]
Processing model.layers.14.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.57it/s]

Merging models: 70%|██████▉ | 203/291 [00:56<00:22, 3.94it/s]
Processing model.layers.30.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.30.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s]

Merging models: 70%|███████ | 204/291 [00:56<00:20, 4.32it/s]
Processing model.layers.3.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.3.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s]

Merging models: 70%|███████ | 205/291 [00:56<00:18, 4.63it/s]
Processing model.layers.30.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.30.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.42it/s]
Processing model.layers.30.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.71it/s]

Merging models: 71%|███████ | 206/291 [00:56<00:19, 4.38it/s]
Processing model.layers.23.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.23.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s]
Processing model.layers.23.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.73it/s]

Merging models: 71%|███████ | 207/291 [00:57<00:24, 3.49it/s]
Processing model.layers.17.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.17.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.31it/s]
Processing model.layers.17.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.69it/s]

Merging models: 71%|███████▏ | 208/291 [00:57<00:23, 3.60it/s]
Processing model.layers.8.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.8.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.91it/s]
Processing model.layers.8.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.71it/s]

Merging models: 72%|███████▏ | 209/291 [00:57<00:26, 3.11it/s]
Processing model.layers.17.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.17.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.66it/s]

Merging models: 72%|███████▏ | 210/291 [00:57<00:22, 3.56it/s]
Processing model.layers.1.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.1.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s]

Merging models: 73%|███████▎ | 211/291 [00:58<00:20, 3.99it/s]
Processing model.layers.11.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.11.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.46it/s]

Merging models: 73%|███████▎ | 212/291 [00:58<00:18, 4.29it/s]
Processing model.layers.28.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.28.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.58it/s]

Merging models: 73%|███████▎ | 213/291 [00:58<00:17, 4.55it/s]
Processing model.layers.4.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.4.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s]

Merging models: 74%|███████▎ | 214/291 [00:58<00:15, 4.82it/s]
Processing model.layers.21.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.21.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s]
Processing model.layers.21.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.79it/s]

Merging models: 74%|███████▍ | 215/291 [00:59<00:20, 3.71it/s]
Processing model.layers.30.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.30.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.50it/s]

Merging models: 74%|███████▍ | 216/291 [00:59<00:18, 4.06it/s]
Processing model.layers.3.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.3.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.14it/s]

Merging models: 75%|███████▍ | 217/291 [00:59<00:16, 4.42it/s]
Processing model.layers.19.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.19.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.18it/s]

Merging models: 75%|███████▍ | 218/291 [00:59<00:15, 4.71it/s]
Processing model.layers.9.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.9.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.82it/s]
Processing model.layers.9.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.51it/s]

Merging models: 75%|███████▌ | 219/291 [00:59<00:16, 4.38it/s]
Processing model.layers.9.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.9.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.90it/s]
Processing model.layers.9.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.77it/s]

Merging models: 76%|███████▌ | 220/291 [01:00<00:20, 3.51it/s]
Processing model.layers.30.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.30.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.04it/s]
Processing model.layers.30.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.73it/s]

Merging models: 76%|███████▌ | 221/291 [01:00<00:22, 3.07it/s]
Processing model.layers.6.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.6.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.17it/s]
Processing model.layers.6.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.60it/s]

Merging models: 76%|███████▋ | 222/291 [01:00<00:21, 3.27it/s]
Processing model.layers.27.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.27.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.85it/s]
Processing model.layers.27.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.70it/s]

Merging models: 77%|███████▋ | 223/291 [01:01<00:23, 2.93it/s]
Processing model.layers.17.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.17.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.03it/s]
Processing model.layers.17.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.75it/s]

Merging models: 77%|███████▋ | 224/291 [01:01<00:24, 2.74it/s]
Processing model.layers.26.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.26.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.97it/s]
Processing model.layers.26.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s]

Merging models: 77%|███████▋ | 225/291 [01:02<00:25, 2.62it/s]
Processing model.layers.16.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.16.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.18it/s]

Merging models: 78%|███████▊ | 226/291 [01:02<00:20, 3.12it/s]
Processing model.layers.10.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.10.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.02it/s]
Processing model.layers.10.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.70it/s]

Merging models: 78%|███████▊ | 227/291 [01:02<00:22, 2.85it/s]
Processing model.layers.23.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.23.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.72it/s]
Processing model.layers.23.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.37it/s]

Merging models: 78%|███████▊ | 228/291 [01:03<00:20, 3.06it/s]
Processing model.layers.3.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.3.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.81it/s]
Processing model.layers.3.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.62it/s]

Merging models: 79%|███████▊ | 229/291 [01:03<00:22, 2.79it/s]
Processing model.layers.24.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.24.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s]
Processing model.layers.24.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.67it/s]

Merging models: 79%|███████▉ | 230/291 [01:03<00:23, 2.65it/s]
Processing model.layers.14.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.14.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.19it/s]

Merging models: 79%|███████▉ | 231/291 [01:04<00:19, 3.14it/s]
Processing model.layers.23.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.23.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.12it/s]

Merging models: 80%|███████▉ | 232/291 [01:04<00:16, 3.61it/s]
Processing model.layers.10.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.10.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s]

Merging models: 80%|████████ | 233/291 [01:04<00:14, 4.04it/s]
Processing model.layers.11.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.11.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.72it/s]
Processing model.layers.11.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.38it/s]

Merging models: 80%|████████ | 234/291 [01:04<00:14, 3.94it/s]
Processing model.layers.7.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.7.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.15it/s]

Merging models: 81%|████████ | 235/291 [01:04<00:12, 4.31it/s]
Processing model.layers.14.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.14.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.84it/s]
Processing model.layers.14.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.65it/s]

Merging models: 81%|████████ | 236/291 [01:05<00:15, 3.44it/s]
Processing model.layers.5.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.5.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.52it/s]

Merging models: 81%|████████▏ | 237/291 [01:05<00:14, 3.84it/s]
Processing model.layers.14.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.14.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.05it/s]
Processing model.layers.14.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s]

Merging models: 82%|████████▏ | 238/291 [01:06<00:16, 3.24it/s]
Processing model.layers.0.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.0.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.18it/s]

Merging models: 82%|████████▏ | 239/291 [01:06<00:14, 3.71it/s]
Processing model.layers.17.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.17.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.98it/s]
Processing model.layers.17.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.69it/s]

Merging models: 82%|████████▏ | 240/291 [01:06<00:16, 3.17it/s]
Processing model.layers.25.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.25.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.99it/s]
Processing model.layers.25.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.68it/s]

Merging models: 83%|████████▎ | 241/291 [01:07<00:17, 2.87it/s]
Processing model.layers.5.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.5.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.96it/s]
Processing model.layers.5.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.67it/s]

Merging models: 83%|████████▎ | 242/291 [01:07<00:18, 2.69it/s]
Processing model.layers.18.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.18.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.96it/s]
Processing model.layers.18.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.67it/s]

Merging models: 84%|████████▎ | 243/291 [01:07<00:18, 2.58it/s]
Processing model.layers.22.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.22.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.76it/s]
Processing model.layers.22.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.37it/s]

Merging models: 84%|████████▍ | 244/291 [01:08<00:16, 2.84it/s]
Processing model.layers.10.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.10.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 10.99it/s]

Merging models: 84%|████████▍ | 245/291 [01:08<00:13, 3.32it/s]
Processing model.layers.19.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.19.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.84it/s]
Processing model.layers.19.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s]

Merging models: 85%|████████▍ | 246/291 [01:08<00:15, 2.95it/s]
Processing model.layers.21.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.21.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.86it/s]
Processing model.layers.21.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.57it/s]

Merging models: 85%|████████▍ | 247/291 [01:09<00:16, 2.72it/s]
Processing model.layers.19.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.19.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.10it/s]
Processing model.layers.19.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.45it/s]

Merging models: 85%|████████▌ | 248/291 [01:09<00:14, 2.97it/s]
Processing model.layers.1.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.1.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.45it/s]

Merging models: 86%|████████▌ | 249/291 [01:09<00:12, 3.41it/s]
Processing model.layers.9.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.9.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.36it/s]

Merging models: 86%|████████▌ | 250/291 [01:09<00:10, 3.79it/s]
Processing model.layers.24.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.24.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.45it/s]

Merging models: 86%|████████▋ | 251/291 [01:10<00:09, 4.13it/s]
Processing model.layers.8.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.8.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.10it/s]
Processing model.layers.8.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.72it/s]

Merging models: 87%|████████▋ | 252/291 [01:10<00:11, 3.39it/s]
Processing model.layers.8.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.8.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.14it/s]

Merging models: 87%|████████▋ | 253/291 [01:10<00:09, 3.84it/s]
Processing model.layers.28.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.28.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.57it/s]
Processing model.layers.28.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.28it/s]

Merging models: 87%|████████▋ | 254/291 [01:10<00:09, 3.78it/s]
Processing model.layers.1.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.1.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 9.52it/s]

Merging models: 88%|████████▊ | 255/291 [01:11<00:08, 4.03it/s]
Processing model.layers.13.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.13.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.71it/s]
Processing model.layers.13.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.39it/s]

Merging models: 88%|████████▊ | 256/291 [01:11<00:08, 3.93it/s]
Processing model.layers.9.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.9.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.58it/s]
Processing model.layers.9.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.32it/s]

Merging models: 88%|████████▊ | 257/291 [01:11<00:08, 3.85it/s]
Processing model.layers.5.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.5.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 3.27it/s]
Processing model.layers.5.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 3.87it/s]

Merging models: 89%|████████▊ | 258/291 [01:12<00:11, 2.93it/s]
Processing model.layers.30.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.30.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.66it/s]
Processing model.layers.30.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.35it/s]

Merging models: 89%|████████▉ | 259/291 [01:12<00:10, 3.12it/s]
Processing model.layers.5.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.5.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.15it/s]

Merging models: 89%|████████▉ | 260/291 [01:12<00:08, 3.60it/s]
Processing model.layers.18.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.18.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.56it/s]
Processing model.layers.18.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 6.63it/s]

Merging models: 90%|████████▉ | 261/291 [01:12<00:08, 3.52it/s]
Processing model.layers.20.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.20.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.06it/s]
Processing model.layers.20.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.11it/s]

Merging models: 90%|█████████ | 262/291 [01:13<00:08, 3.53it/s]
Processing model.layers.21.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.21.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.76it/s]
Processing model.layers.21.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.55it/s]

Merging models: 90%|█████████ | 263/291 [01:13<00:09, 3.04it/s]
Processing model.layers.22.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.22.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.16it/s]

Merging models: 91%|█████████ | 264/291 [01:13<00:07, 3.52it/s]
Processing model.layers.4.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.4.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.88it/s]
Processing model.layers.4.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 3.50it/s]

Merging models: 91%|█████████ | 265/291 [01:14<00:09, 2.75it/s]
Processing model.layers.18.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.18.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.19it/s]

Merging models: 91%|█████████▏| 266/291 [01:14<00:07, 3.25it/s]
Processing model.layers.22.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.22.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s]

Merging models: 92%|█████████▏| 267/291 [01:14<00:06, 3.71it/s]
Processing model.layers.16.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.16.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.65it/s]
Processing model.layers.16.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.29it/s]

Merging models: 92%|█████████▏| 268/291 [01:15<00:06, 3.70it/s]
Processing model.layers.7.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.7.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.65it/s]
Processing model.layers.7.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.50it/s]

Merging models: 92%|█████████▏| 269/291 [01:15<00:07, 3.10it/s]
Processing model.layers.13.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.13.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.56it/s]

Merging models: 93%|█████████▎| 270/291 [01:15<00:05, 3.54it/s]
Processing model.layers.29.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.29.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.60it/s]
Processing model.layers.29.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.24it/s]

Merging models: 93%|█████████▎| 271/291 [01:15<00:05, 3.57it/s]
Processing model.layers.8.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.8.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.66it/s]
Processing model.layers.8.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.36it/s]

Merging models: 93%|█████████▎| 272/291 [01:16<00:05, 3.61it/s]
Processing model.layers.27.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.27.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.79it/s]
Processing model.layers.27.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.62it/s]

Merging models: 94%|█████████▍| 273/291 [01:16<00:05, 3.09it/s]
Processing model.layers.13.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.13.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.17it/s]

Merging models: 94%|█████████▍| 274/291 [01:16<00:04, 3.57it/s]
Processing model.layers.16.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.16.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.54it/s]

Merging models: 95%|█████████▍| 275/291 [01:16<00:04, 3.95it/s]
Processing model.layers.15.input_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.15.input_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s]

Merging models: 95%|█████████▍| 276/291 [01:17<00:03, 4.33it/s]
Processing model.layers.2.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.2.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.95it/s]
Processing model.layers.2.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 3.44it/s]

Merging models: 95%|█████████▌| 277/291 [01:17<00:04, 3.05it/s]
Processing model.layers.3.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.3.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.81it/s]
Processing model.layers.3.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s]

Merging models: 96%|█████████▌| 278/291 [01:18<00:04, 2.79it/s]
Processing model.layers.4.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.4.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.74it/s]
Processing model.layers.4.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.37it/s]

Merging models: 96%|█████████▌| 279/291 [01:18<00:03, 3.01it/s]
Processing model.layers.23.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.23.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.63it/s]
Processing model.layers.23.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.34it/s]

Merging models: 96%|█████████▌| 280/291 [01:18<00:03, 3.19it/s]
Processing model.layers.31.post_attention_layernorm.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.31.post_attention_layernorm.weight: 100%|██████████| 2/2 [00:00<00:00, 11.20it/s]

Merging models: 97%|█████████▋| 281/291 [01:18<00:02, 3.66it/s]
Processing model.layers.12.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.12.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.81it/s]
Processing model.layers.12.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.60it/s]

Merging models: 97%|█████████▋| 282/291 [01:19<00:02, 3.12it/s]
Processing model.layers.7.mlp.down_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.7.mlp.down_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.89it/s]
Processing model.layers.7.mlp.down_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.64it/s]

Merging models: 97%|█████████▋| 283/291 [01:19<00:02, 2.83it/s]
Processing model.layers.9.mlp.up_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.9.mlp.up_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.94it/s]
Processing model.layers.9.mlp.up_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 3.43it/s]

Merging models: 98%|█████████▊| 284/291 [01:20<00:02, 2.41it/s]
Processing model.layers.15.mlp.gate_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.15.mlp.gate_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 4.71it/s]
Processing model.layers.15.mlp.gate_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 4.35it/s]

Merging models: 98%|█████████▊| 285/291 [01:20<00:02, 2.34it/s]
Processing model.layers.26.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.26.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 5.75it/s]
Processing model.layers.26.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 6.51it/s]

Merging models: 98%|█████████▊| 286/291 [01:21<00:01, 2.55it/s]
Processing model.layers.29.self_attn.k_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.29.self_attn.k_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.51it/s]

Merging models: 99%|█████████▊| 287/291 [01:21<00:01, 3.01it/s]
Processing model.layers.5.self_attn.q_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.5.self_attn.q_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.60it/s]
Processing model.layers.5.self_attn.q_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.29it/s]

Merging models: 99%|█████████▉| 288/291 [01:21<00:00, 3.18it/s]
Processing model.layers.18.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.18.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 8.02it/s]
Processing model.layers.18.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.48it/s]

Merging models: 99%|█████████▉| 289/291 [01:21<00:00, 3.34it/s]
Processing model.layers.2.self_attn.v_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.2.self_attn.v_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 10.27it/s]

Merging models: 100%|█████████▉| 290/291 [01:21<00:00, 3.73it/s]
Processing model.layers.3.self_attn.o_proj.weight: 0%| | 0/2 [00:00<?, ?it/s]
Processing model.layers.3.self_attn.o_proj.weight: 50%|█████ | 1/2 [00:00<00:00, 7.83it/s]
Processing model.layers.3.self_attn.o_proj.weight: 100%|██████████| 2/2 [00:00<00:00, 7.56it/s]

Merging models: 100%|██████████| 291/291 [01:22<00:00, 3.75it/s]
Merging models: 100%|██████████| 291/291 [01:22<00:00, 3.54it/s]
create a temporary file to store mixed weights: /tmp/tmpp5xfq8r2.ckpt
***weight for each model***:
/media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/Mistral-7B-Instruct-v0.2 0.5
/media/hangyu5/Home/Documents/Hugging-Face/LM_cocktail/xDAN-L1-Chat-RL-v1 0.5
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.embed_tokens.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.0.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.1.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.2.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.3.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.4.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.5.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.6.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.7.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.8.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.9.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:00, 9.42it/s]/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.10.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.11.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.12.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.13.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.14.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.15.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.16.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.17.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.18.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.19.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.20.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.21.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Loading checkpoint shards: 67%|██████▋ | 2/3 [00:00<00:00, 9.31it/s]/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.22.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.23.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.24.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.25.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.26.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.27.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.28.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.29.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.30.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.q_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.k_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.v_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.self_attn.o_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.mlp.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.mlp.up_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.mlp.down_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.layers.31.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for model.norm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/home/hangyu5/anaconda3/envs/llmcocktail/lib/python3.11/site-packages/torch/nn/modules/module.py:2025: UserWarning: for lm_head.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 9.45it/s]
Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 9.42it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Using pad_token, but it is not set yet.
Using pad_token, but it is not set yet.
Remove temporary file: /tmp/tmpp5xfq8r2.ckpt
Remove temporary directory: /tmp/tmpc0jeswnf
Saving the new model to ./mixed_llm
MistralForCausalLM(
(model): MistralModel(
(embed_tokens): Embedding(32000, 4096)
(layers): ModuleList(
(0-31): 32 x MistralDecoderLayer(
(self_attn): MistralAttention(
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear(in_features=4096, out_features=1024, bias=False)
(v_proj): Linear(in_features=4096, out_features=1024, bias=False)
(o_proj): Linear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): MistralRotaryEmbedding()
)
(mlp): MistralMLP(
(gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
(up_proj): Linear(in_features=4096, out_features=14336, bias=False)
(down_proj): Linear(in_features=14336, out_features=4096, bias=False)
(act_fn): SiLUActivation()
)
(input_layernorm): MistralRMSNorm()
(post_attention_layernorm): MistralRMSNorm()
)
)
(norm): MistralRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)