初始化项目，由ModelHub XC社区提供模型

Model: snap-stanford/humanlm-opinion Source: Original Platform
2026-05-06 23:56:46 +08:00
commit 485e0d9e0f
16 changed files with 152522 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,195 @@
 ---
 license: apache-2.0
 base_model: Qwen/Qwen3-8B
 datasets:
 - humanlm/humanual-opinion
 language:
 - en
 tags:
 - user-simulation
 - persona
 - grpo
 - reinforcement-learning
 - state-alignment
 - humanlm
 library_name: transformers
 pipeline_tag: text-generation
 ---
 # HumanLM-Opinion
 **HumanLM** is a user simulator that generates responses capturing the underlying states of real users (beliefs, emotions, stance, values, goals, communication style).
 This checkpoint is trained on the **Humanual-Opinion** benchmark, which contains Reddit users’ opinionated responses in personal-issue discussion threads.
 📄 **Paper:** [HumanLM: Simulating Users with State Alignment Beats Response Imitation]()  
 🌐 **Project Page:** [humanlm.stanford.edu](http://humanlm.stanford.edu/)
 ## Model Details
 - **Base Model:** [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
 - **Training Method:** GRPO (Group Relative Policy Optimization) with state alignment
 - **Training Data:** Humanual-Opinion (4.6k Reddit users, 46k responses across 1k threads) 
 ### What Makes HumanLM Different?
 Unlike standard fine-tuning which imitates surface-level language, HumanLM explicitly aligns along six psychologically-grounded state dimensions:
 | Aspect | Dimensions | Description |
 |--------|------------|-------------|
 | Cognitive | belief, goal | What the user thinks is true; what they want to achieve |
 | Normative | value, stance | What matters to them; their position on specific topics |
 | Affective | emotion | How they feel about the situation |
 | Linguistic | communication | How they structure and express their message |
 During generation, the model reasons about these latent states in a `<think>` block before synthesizing the final response.
 ## Quickstart
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_name = "humanlm/humanlm-opinion"
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
 )
 # User persona (summarized from history)
 persona = """
 Demographics:
  age group: Likely 30s-40s (parent of middle school-aged child)
  other: Parent of a neurodivergent middle school-aged child
 Interests:
  Family dynamics and interpersonal conflicts, particularly in AITA scenarios
  Wedding etiquette and boundary-setting in social situations
  Parent-child relationships and estrangement issues
  ...
 Values:
  Believes toxic traditions should not be perpetuated: 'I think it's telling these people find perpetuating a toxic tradition "easier" then getting someone a pair of socks'
  Values earned relationships over automatic family privileges: 'they have in no way earned that privilege'
  ...
 Communication:
  Balances empathy with practical advice in responses
  Employs humor and sarcasm occasionally: 'She could get a cheap tagging gun and reattach the tags'
  Makes direct, straightforward observations without excessive hedging
  ...
 Statistics:
  Uses exclamation points in most responses for emphasis
  Tends to structure responses with 2-4 distinct points or sentences
  Often begins responses with agreement or validation before adding commentary
  ...
 """
 # Context (e.g., a Reddit AITA post)
 context = """AITA for demanding that my niece, or her parents, pay me back for the hundreds of dollars of perfume she stole from me?
 (**Note: I’m not involving the police, suing anyone, etc. Please don't try to argue with me about this or "convince" me why I should.**)
 I have a perfume collection that I started when I was a teenager slinging burritos as my first job. I have over 400 bottles at this point, I take great pride in my collection, and I use it.
 I’m also happy to give people decants (samples) of most of my bottles, let them sample a spray or two, give some bottles as gifts, etc.
 ....
 AITA?"""
 messages = [
    {"role": "system", "content": f"You are a real human user. Your name is HUMAN. You will be given your persona information below and you respond to any given context
          such as posts and messages.\n\nYour persona:\n{persona}n<|The End of Persona|>\n\n
          ## Task and Output format:<response>\n<HUMAN's actual written comment or reply text.>\n</response>"},
    {"role": "user", "content": context}
 ]
 text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True
 )
 model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 stop_token_id = tokenizer.encode("</response>", add_special_tokens=False)[0]
 generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.4,
    no_repeat_ngram_size=4,
    eos_token_id=stop_token_id,
 )
 output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
 response = tokenizer.decode(output_ids, skip_special_tokens=True)
 ```
 ## Output Format
 The model generates responses in the following format:
 ```
 <think>
 [Reasoning about the user's latent states: stance, emotion, belief, value, goal, communication style]
 </think>
 <response>
 [The actual user response]
 </response>
 ```
 ### Example Output
 ```
 <think>
 Okay, let's see.
 ...
 The user's persona is a parent of a neuro divergent middle schooler, values accountability, boundaries, and doesn't tolerate toxic traditions. They support personal accountability and consequences. They also use a conversational tone with exclamation points and short sentences.
 So, the key points here are: the niece stole perfume, sold it, the user documented the theft, expects repayment, and is cutting her off from their home.The sister is defending the niece, but the user is standing their ground.
 The user wants to know AITA, so the response should validate their stance. The persona would likely agree that the niece should face consequences, not be excused just because she's a teenager. The user's response should emphasize that she took a risk, didn't payoff, and that the user has the right to expect repayment. Also, the sister's argument about the niece needing money for a car doesn't matter because she chose to steal.
 Need to make sure the response is in line with the persona's values: accountability, boundaries， practical problem-solving, and not enabling bad behavior.
 </think>
 <response>
 NTA! She took a risk (selling stolen perfume, sneaking around), it didn’t pay off. You have every right to expect her to pay back the hundreds she stole and cut her off from your home. Her “kids make mistakes” excuse doesn’t excuse theft or entitlement. She’s lucky you’re not involving the law—she should face consequences.
 </response>
 ```
 ## Performance
 On Humanual-Opinion, HumanLM achieves a **7.6% relative improvement** on alignment score by an LLM judge over the best baseline (GRPO-think).
 ### Real-Time User Study (111 participants)
 - **41.4% win rate** on overall similarity (vs. 30.6% Qwen3-8B-think, 27.9% GRPO-think)
 - **76.6%** of responses rated as "quite natural" or "indistinguishable from human"
 - Statistically significant improvements (p < 0.05 vs Qwen3-8B-think, p < 0.01 vs GRPO-think)
 ## Safety Evaluation
 To determine whether our user-state alignment training erodes the safety behavior of the underlying aligned base model (Qwen3-8B), we ran an adversarial safety evaluation with the [Azure AI Evaluation SDK](https://learn.microsoft.com/en-us/python/api/overview/azure/ai-evaluation-readme?view=azure-python). 
 We evaluated the base model and HumanLM using adversarial prompts intended to elicit unsafe behavior, comparing their tendency to generate harmful outputs.
 We use Azure AI's AdversarialSimulator to generate adversarial queries designed to elicit harmful responses. Each query is scored by four harm evaluators (Violence, Sexual, Self-Harm, and Hate/Unfairness) and assigned to the highest-scoring category. We iteratively generate queries until obtaining at least 20 per category (80 total).
 For each query, we generate responses from both the base model (Qwen3-8B-think) and HUMANLM-Opinion, then score them using Azure AI's content safety evaluators on a 0–7 scale (0–1 = safe, 6–7 = unsafe). We report average harm scores per category, where lower scores indicate safer behavior.
 | Harm Type | Qwen3-8B-think | HumanLM |
 |-----------|----------------|---------|
 | Violence | 4.40 | **4.20** |
 | Sexual | **4.30** | 4.45 |
 | Self-harm | 3.90 | **3.60** |
 | Hate | **3.35** | 3.55 |
 *Scores range 0-7 (lower = safer). HumanLM shows comparable safety to the base model.*
 ## Intended Use
 - **User research:** Understanding how different users respond to content
 - **Content testing:** Predicting how target audiences might react to posts, articles, or policies
 - **AI alignment:** Generating diverse user feedback for training collaborative AI systems
 - **Social simulation:** Modeling opinion dynamics in online communities
 ## Citation
 ```bibtex
@article{wu2026humanlm,
  title={HUMANLM: Simulating Users with State Alignment Beats Response Imitation},
  url={https://humanlm.stanford.edu/},
  author={Wu, Shirley and Choi, Evelyn and Khatua, Arpandeep and
          Wang, Zhanghan and He-Yueya, Joy and Weerasooriya, Tharindu Cyril and
          Wei, Wei and Yang, Diyi and Leskovec, Jure and Zou, James},
  year={2026}
 }
 ```
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,28 @@
 {
  "</think>": 151668,
  "</tool_call>": 151658,
  "</tool_response>": 151666,
  "<think>": 151667,
  "<tool_call>": 151657,
  "<tool_response>": 151665,
  "<|box_end|>": 151649,
  "<|box_start|>": 151648,
  "<|endoftext|>": 151643,
  "<|file_sep|>": 151664,
  "<|fim_middle|>": 151660,
  "<|fim_pad|>": 151662,
  "<|fim_prefix|>": 151659,
  "<|fim_suffix|>": 151661,
  "<|im_end|>": 151645,
  "<|im_start|>": 151644,
  "<|image_pad|>": 151655,
  "<|object_ref_end|>": 151647,
  "<|object_ref_start|>": 151646,
  "<|quad_end|>": 151651,
  "<|quad_start|>": 151650,
  "<|repo_name|>": 151663,
  "<|video_pad|>": 151656,
  "<|vision_end|>": 151653,
  "<|vision_pad|>": 151654,
  "<|vision_start|>": 151652
 }
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,101 @@
 {%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
 {%- else %}
    {%- if messages[0].role == 'system' %}
        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
    {%- endif %}
 {%- endif %}
 {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
 {%- for message in messages[::-1] %}
    {%- set index = (messages|length - 1) - loop.index0 %}
    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
        {%- set ns.multi_step_tool = false %}
        {%- set ns.last_query_index = index %}
    {%- endif %}
 {%- endfor %}
 {%- for message in messages %}
    {%- if message.content is string %}
        {%- set content = message.content %}
    {%- else %}
        {%- set content = '' %}
    {%- endif %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
        {{- '<|im_start|>' + message.role + '\n' }}
        {%- if message.name %}<name>{{ message.name }}</name>
        {%- endif -%}
        {{- '\n' + message.content | trim + '<|im_end|>\n' }}
    {%- elif message.role == "assistant" %}
        {%- set reasoning_content = '' %}
        {%- if message.reasoning_content is string %}
            {%- set reasoning_content = message.reasoning_content %}
        {%- else %}
            {%- if '</think>' in content %}
                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
            {%- endif %}
        {%- endif %}
        {{- '<|im_start|>' + message.role + '\n' }}
        {%- if message.name %}<name>{{ message.name }}</name>
            {{- '\n' }}
        {%- endif %}
        {%- if loop.index0 > ns.last_query_index %}
            {%- if loop.last or (not loop.last and reasoning_content) %}
                {{- '<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
            {%- else %}
                {{- content }}
            {%- endif %}
        {%- else %}
            {{- content }}
        {%- endif %}
        {%- if message.tool_calls %}
            {%- for tool_call in message.tool_calls %}
                {%- if (loop.first and content) or (not loop.first) %}
                    {{- '\n' }}
                {%- endif %}
                {%- if tool_call.function %}
                    {%- set tool_call = tool_call.function %}
                {%- endif %}
                {{- '<tool_call>\n{"name": "' }}
                {{- tool_call.name }}
                {{- '", "arguments": ' }}
                {%- if tool_call.arguments is string %}
                    {{- tool_call.arguments }}
                {%- else %}
                    {{- tool_call.arguments | tojson }}
                {%- endif %}
                {{- '}\n</tool_call>' }}
            {%- endfor %}
        {%- endif %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "tool" %}
        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
            {{- '<|im_start|>user' }}
        {%- endif %}
        {{- '\n<tool_response>\n' }}
        {{- content }}
        {{- '\n</tool_response>' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- endif %}
 {%- endfor %}
 {%- if add_generation_prompt %}
    {{- '<|im_start|>user\n' }}
    {%- if speak_as is defined and speak_as %}<name>{{ speak_as }}</name>
    {%- endif -%}
    {{- '\n' }}
    {%- if enable_thinking is defined and enable_thinking is false %}
        {{- '<think>\n\n</think>\n\n' }}
    {%- elif enable_thinking is defined and enable_thinking is true %}
        {{- '<think>' }}
    {%- endif %}
 {%- endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,68 @@
 {
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 12288,
  "layer_types": [
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention"
  ],
  "max_position_embeddings": 40960,
  "max_window_layers": 36,
  "model_type": "qwen3",
  "num_attention_heads": 32,
  "num_hidden_layers": 36,
  "num_key_value_heads": 8,
  "pad_token_id": 151643,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.55.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,13 @@
 {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "temperature": 0.6,
  "top_k": 20,
  "top_p": 0.95,
  "transformers_version": "4.55.2"
 }
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00004.safetensors
+++ b/model-00001-of-00004.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:53b0d9553f7f10565aed769df4d8107afbc44652c95c24eeb3d3130790d1dffc
 size 4924336344
--- a/model-00002-of-00004.safetensors
+++ b/model-00002-of-00004.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:d2712df816ca81b705a3caba02d9ef79c6503958ad0701144f6b0f540d1172ec
 size 4944248992
--- a/model-00003-of-00004.safetensors
+++ b/model-00003-of-00004.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:f3a37db985a6e4ec8ccf33e8dc863377098c8301cb0bcfa8c11cd79aabf331b4
 size 3984759752
--- a/model-00004-of-00004.safetensors
+++ b/model-00004-of-00004.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:3fd242c2d945849c73ecff26acd2136aa7bbfa495e8160d1d0205804cf6ce20a
 size 2528171712
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,407 @@
 {
  "metadata": {
    "total_parameters": 8190735360,
    "total_size": 16381470720
  },
  "weight_map": {
    "lm_head.weight": "model-00004-of-00004.safetensors",
    "model.embed_tokens.weight": "model-00002-of-00004.safetensors",
    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.0.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.0.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.0.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
    "model.layers.0.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.0.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
    "model.layers.0.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.0.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.1.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.1.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.1.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.1.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.1.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.1.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.1.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.1.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.1.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.1.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.10.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.10.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.10.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
    "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.10.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.11.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.11.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.11.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.11.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.12.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.12.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.12.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.12.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.12.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.12.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.13.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.13.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.13.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.13.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.14.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.14.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.14.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.14.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.15.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.15.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.15.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.15.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.15.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.16.input_layernorm.weight": "model-00004-of-00004.safetensors",
    "model.layers.16.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.16.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.16.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.17.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.17.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.18.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.18.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
    "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.18.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.19.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.2.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.2.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.2.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.2.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.2.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.2.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.20.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.21.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.21.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.21.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.21.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.21.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.22.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.23.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.23.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.23.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.input_layernorm.weight": "model-00004-of-00004.safetensors",
    "model.layers.24.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.24.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.24.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.24.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.24.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.24.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.24.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.25.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.25.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
    "model.layers.25.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.25.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.26.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.26.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.26.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.26.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.27.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.27.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.27.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.27.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.28.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.28.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.28.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.28.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.28.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.28.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.29.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.29.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.29.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.29.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.29.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.29.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.29.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.3.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.3.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.3.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.3.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.3.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.30.input_layernorm.weight": "model-00004-of-00004.safetensors",
    "model.layers.30.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.30.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.30.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.30.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.30.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.30.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.31.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.31.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.31.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.31.self_attn.k_norm.weight": "model-00004-of-00004.safetensors",
    "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.31.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.31.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.31.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.32.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.32.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.32.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.32.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.32.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.32.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.32.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.32.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.32.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.32.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.32.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.33.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.33.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.33.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.33.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.33.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.33.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.33.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.33.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.33.self_attn.q_norm.weight": "model-00004-of-00004.safetensors",
    "model.layers.33.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.33.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.34.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.34.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.34.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.34.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.34.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.34.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.34.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.34.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.34.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.34.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.34.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.35.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.35.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.35.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.35.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.35.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.35.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.35.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.35.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.35.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.35.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.35.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.4.input_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.4.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.4.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.input_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.5.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.5.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.self_attn.k_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.6.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.6.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.6.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.6.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.7.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
    "model.layers.7.self_attn.k_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.7.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.7.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.7.self_attn.q_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.7.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.7.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.8.input_layernorm.weight": "model-00004-of-00004.safetensors",
    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
    "model.layers.8.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.8.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.8.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.8.self_attn.q_norm.weight": "model-00001-of-00004.safetensors",
    "model.layers.8.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.8.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.9.input_layernorm.weight": "model-00001-of-00004.safetensors",
    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.k_norm.weight": "model-00003-of-00004.safetensors",
    "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.q_norm.weight": "model-00002-of-00004.safetensors",
    "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
    "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
    "model.norm.weight": "model-00003-of-00004.safetensors"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,31 @@
 {
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "eos_token": {
    "content": "<|im_end|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
 size 11422654
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,239 @@
 {
  "add_bos_token": false,
  "add_prefix_space": false,
  "added_tokens_decoder": {
    "151643": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151644": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151645": {
      "content": "<|im_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151646": {
      "content": "<|object_ref_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151647": {
      "content": "<|object_ref_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151648": {
      "content": "<|box_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151649": {
      "content": "<|box_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151650": {
      "content": "<|quad_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151651": {
      "content": "<|quad_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151652": {
      "content": "<|vision_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151653": {
      "content": "<|vision_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151654": {
      "content": "<|vision_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151655": {
      "content": "<|image_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151656": {
      "content": "<|video_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151657": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151658": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151659": {
      "content": "<|fim_prefix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151660": {
      "content": "<|fim_middle|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151661": {
      "content": "<|fim_suffix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151662": {
      "content": "<|fim_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151663": {
      "content": "<|repo_name|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151664": {
      "content": "<|file_sep|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151665": {
      "content": "<tool_response>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151666": {
      "content": "</tool_response>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151667": {
      "content": "<think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151668": {
      "content": "</think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    }
  },
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "bos_token": null,
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "errors": "replace",
  "extra_special_tokens": {},
  "model_max_length": 131072,
  "pad_token": "<|endoftext|>",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": null
 }
--- a/vocab.json
+++ b/vocab.json