初始化项目，由ModelHub XC社区提供模型

Model: kehanlu/llama-3.2-8B-Instruct Source: Original Platform
2026-05-13 18:58:21 +08:00
commit 7b45b1320d
12 changed files with 2557 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,62 @@
+---
+library_name: transformers
+tags: []
+---
+
+This repository contains the text-only LLM portion of `meta-llama/Llama-3.2-11B-Vision-Instruct`
+
+**How it was done**
+
+```python
+from collections import OrderedDict
+from transformers import MllamaForConditionalGeneration, AutoModelForCausalLM
+from transformers.models.mllama.modeling_mllama import MllamaCrossAttentionDecoderLayer
+llama32_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
+llama32 = MllamaForConditionalGeneration.from_pretrained(
+    llama32_id,
+    torch_dtype=torch.bfloat16,
+    device_map="cuda:0",
+)
+
+
+new_layers = []
+for idx, layer in enumerate(llama32.language_model.model.layers):
+    if isinstance(layer, MllamaCrossAttentionDecoderLayer):
+        # CrossAttention layers are only take effect when image is provided.
+        # Ignore here since we want text-only model
+        pass
+    else:
+        new_layers.append(layer)
+llama32.language_model.model.cross_attention_layers = []
+llama32.language_model.model.layers = torch.nn.ModuleList(new_layers)
+
+
+# Now llama32.language_model is identical to Llama3.1-8B-Instruct, except the embedding size(+8)
+# see: https://github.com/huggingface/transformers/blob/a22a4378d97d06b7a1d9abad6e0086d30fdea199/src/transformers/models/mllama/modeling_mllama.py#L1667C9-L1667C26
+new_llama32_state_dict = OrderedDict()
+for k, v in llama32.language_model.state_dict().items():
+    if k == "model.embed_tokens.weight":
+        v = v[:128256, :]
+    new_llama32_state_dict[k] = v
+
+
+# Load a llama31 for the architecture
+llama31_id = "meta-llama/Llama-3.1-8B-Instruct"
+llama31 = AutoModelForCausalLM.from_pretrained(
+    llama31_id,
+    torch_dtype=torch.bfloat16,
+    device_map="cuda:1",
+)
+
+llama31.load_state_dict(new_llama32_state_dict)
+# <All keys matched successfully>
+
+llama31.save_pretrained("./my-cool-llama3.2")
+```
+
+
+**Note:**
+
+In the original tokenizer, there are `date_string` in `tokenizer.chat_template` (which append the current date when calling `tokenizer.apply_chat_template(messages)`).
+
+I removed this behavior in this repo. Please be aware when you use `AutoTokenizer.from_pretrained(this_repo)`.