初始化项目，由ModelHub XC社区提供模型

Model: hkust-nlp/drkernel-8b-coldstart Source: Original Platform
2026-06-04 12:06:18 +08:00
commit 19d7c433a4
19 changed files with 152692 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,36 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,280 @@
 ---
 library_name: transformers
 pipeline_tag: text-generation
 base_model: Qwen/Qwen3-8B-Base
 tags:
  - qwen3
  - triton
  - kernel-generation
  - supervised-finetuning
  - cold-start
  - code
 datasets:
  - hkust-nlp/drkernel-coldstart-8k
 ---
 # DR.Kernel-8B-ColdStart
 [![Model](https://img.shields.io/badge/🤗%20Model-hkust--nlp/drkernel--8b--coldstart-yellow)](https://huggingface.co/hkust-nlp/drkernel-8b-coldstart)
 [![Paper](https://img.shields.io/badge/arXiv-2602.05885-b31b1b)](https://arxiv.org/abs/2602.05885)
 `hkust-nlp/drkernel-8b-coldstart` is the **cold-start SFT checkpoint** for DR.Kernel.
 This model is trained on multi-turn SFT data only, and is intended as the initialization checkpoint before RL (TRLOO/MRS/PR/PRS).
 ## Model Summary
 - Model type: `Qwen3ForCausalLM`
 - Base model family: Qwen3-8B
 - Stage: cold-start supervised fine-tuning (before RL)
 - Main capability: structured kernel-optimization responses (`Model` -> `ModelNew`) with DR.Kernel prompt format
 ## Training Stage
 This checkpoint corresponds to:
 1. Cold-start SFT only
   - Dataset: `hkust-nlp/drkernel-coldstart-8k`
   - Multi-turn trajectories to teach kernel-generation/refinement behavior
 Not included in this checkpoint:
 - RL stage (TRLOO + MRS + PR + PRS)
 - RL reward shaping / rejection sampling updates
 Related script:
 - `drkernel/kernel/scripts/sft/8b-coldstart.sh`
 ## Intended Use
 - As an initialization checkpoint for DR.Kernel RL training
 - As a strong SFT baseline for kernel generation
 - For ablations comparing cold-start vs post-RL checkpoints
 ## Not Intended Use
 - Final performance claims for DR.Kernel RL results
 - Safety-critical production deployment without additional verification
 ## Quick Start (Transformers)
 Use the same fixed 1-shot first-turn prompt template as DR.Kernel data (recommended):
 ````python
 import textwrap
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_id = "hkust-nlp/drkernel-8b-coldstart"
 tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
 )
 ref_code = textwrap.dedent(
    """
    import torch
    import torch.nn as nn
    class Model(nn.Module):
        def __init__(self):
            super().__init__()
        def forward(self, x):
            x = torch.abs(x)
            x = x - 1.0
            return x
    def get_inputs():
        return [torch.randn(64, 128)]
    def get_init_inputs():
        return []
    """
 ).strip()
 example_ref_code = textwrap.dedent(
    """
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    class Model(nn.Module):
        def __init__(self) -> None:
            super().__init__()
        def forward(self, a, b):
            return a + b
    def get_inputs():
        # randomly generate input tensors based on the model architecture
        a = torch.randn(1, 128).cuda()
        b = torch.randn(1, 128).cuda()
        return [a, b]
    def get_init_inputs():
        # randomly generate tensors required for initialization based on the model architecture
        return []
    """
 ).strip()
 example_kernel_code = textwrap.dedent(
    '''
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import triton
    import triton.language as tl
    @triton.jit
    def add_kernel(
        x_ptr,  # Pointer to first input
        y_ptr,  # Pointer to second input
        out_ptr,  # Pointer to output
        n_elements,  # Total number of elements in input/output
        BLOCK_SIZE: tl.constexpr,
    ):
        # Each program handles a contiguous block of data of size BLOCK_SIZE
        block_start = tl.program_id(0) * BLOCK_SIZE
        # Create a range of offsets [0..BLOCK_SIZE-1]
        offsets = block_start + tl.arange(0, BLOCK_SIZE)
        # Mask to ensure we don't go out of bounds
        mask = offsets < n_elements
        # Load input values
        x = tl.load(x_ptr + offsets, mask=mask, other=0.0)
        y = tl.load(y_ptr + offsets, mask=mask, other=0.0)
        # Perform the elementwise addition
        out = x + y
        # Store the result
        tl.store(out_ptr + offsets, out, mask=mask)
    def triton_add(x: torch.Tensor, y: torch.Tensor):
        """
        This function wraps the Triton kernel call. It:
          1. Ensures the inputs are contiguous on GPU.
          2. Calculates the grid (blocks) needed.
          3. Launches the Triton kernel.
        """
        assert x.is_cuda and y.is_cuda, "Tensors must be on CUDA."
        x = x.contiguous()
        y = y.contiguous()
        # Prepare output tensor
        out = torch.empty_like(x)
        # Number of elements in the tensor
        n_elements = x.numel()
        BLOCK_SIZE = 128  # Tunable parameter for block size
        # Determine the number of blocks needed
        grid = lambda meta: ((n_elements + meta["BLOCK_SIZE"] - 1) // meta["BLOCK_SIZE"],)
        # Launch the Triton kernel
        add_kernel[grid](x, y, out, n_elements, BLOCK_SIZE=BLOCK_SIZE)
        return out
    class ModelNew(nn.Module):
        def __init__(self) -> None:
            super().__init__()
        def forward(self, a, b):
            # Instead of "return a + b", call our Triton-based addition
            return triton_add(a, b)
    '''
 ).strip()
 prompt_template = textwrap.dedent(
    """\
    You write custom Triton kernels to replace the pytorch operators in the given architecture to get speedups.
    You have complete freedom to choose the set of operators you want to replace. You may make the decision to replace some operators with custom Triton kernels and leave others unchanged. You may replace multiple operators with custom implementations, consider operator fusion opportunities (combining multiple operators into a single kernel, for example, combining matmul+relu), or algorithmic changes (such as online softmax). You are only limited by your imagination.
    Here's an example to show you the syntax of inline embedding custom Triton kernels in torch: The example given architecture is:
    ```python
    {example_ref_code}
    ```
    The example new arch with custom Triton kernels looks like this:
    ```python
    {example_kernel_code}
    ```
    You are given the following architecture:
    ```python
    {ref_code}
    ```
    Optimize the architecture named Model with custom Triton operators! Name your optimized output architecture ModelNew. Output the new code in codeblocks. Please generate real code, NOT pseudocode, make sure the code compiles and is fully functional. Let's think step by step.
    """
 ).strip()
 prompt = prompt_template.format(
    example_ref_code=example_ref_code,
    example_kernel_code=example_kernel_code,
    ref_code=ref_code,
 )
 messages = [{"role": "user", "content": prompt}]
 inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
 ).to(model.device)
 with torch.no_grad():
    outputs = model.generate(
        inputs,
        max_new_tokens=2048,
        do_sample=True,
        temperature=1.0,
        top_p=1.0,
    )
 # Only print newly generated tokens
 print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=False))
 ````
 ## Continue to RL Training
 This checkpoint is intended to be fed into RL training:
 - Script: `drkernel/kernel/scripts/rl/8b_trloo_mrs_pr_prs.sh`
 - Typical model setting: `MODEL_PATH="hkust-nlp/drkernel-8b-coldstart"` (or local path)
 - RL datasets:
  - `hkust-nlp/drkernel-rl-data`
  - `hkust-nlp/drkernel-validation-data`
 ## Data and Attribution
 - Cold-start SFT data:
  - [hkust-nlp/drkernel-coldstart-8k](https://huggingface.co/datasets/hkust-nlp/drkernel-coldstart-8k)
 - Query/task source includes:
  - [ByteDance-Seed/cudaLLM-data](https://huggingface.co/datasets/ByteDance-Seed/cudaLLM-data)
 - Benchmark source:
  - [KernelBench](https://github.com/ScalingIntelligence/KernelBench)
 Please acknowledge original dataset/benchmark authors when using this model.
 ## Related Resources
 - Final RL model: [hkust-nlp/drkernel-8b](https://huggingface.co/hkust-nlp/drkernel-8b)
 - Paper: [Dr.Kernel: Reinforcement Learning Done Right for Triton Kernel Generations](https://arxiv.org/abs/2602.05885)
 - Codebase: [KernelGYM](https://github.com/hkust-nlp/KernelGYM)
 - Training docs: `drkernel/README.md`
 ## Citation
 ```bibtex
@article{liuetal2026,
  title={Dr.Kernel: Reinforcement Learning Done Right for Triton Kernel Generations},
  author={Wei Liu, Jiawei Xu, Yingru Li, Longtao Zheng, Tianjian Li, Qian Liu, Junxian He},
  journal={arXiv:2602.05885},
  year={2026}
 }
 ```
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,28 @@
 {
  "</think>": 151668,
  "</tool_call>": 151658,
  "</tool_response>": 151666,
  "<think>": 151667,
  "<tool_call>": 151657,
  "<tool_response>": 151665,
  "<|box_end|>": 151649,
  "<|box_start|>": 151648,
  "<|endoftext|>": 151643,
  "<|file_sep|>": 151664,
  "<|fim_middle|>": 151660,
  "<|fim_pad|>": 151662,
  "<|fim_prefix|>": 151659,
  "<|fim_suffix|>": 151661,
  "<|im_end|>": 151645,
  "<|im_start|>": 151644,
  "<|image_pad|>": 151655,
  "<|object_ref_end|>": 151647,
  "<|object_ref_start|>": 151646,
  "<|quad_end|>": 151651,
  "<|quad_start|>": 151650,
  "<|repo_name|>": 151663,
  "<|video_pad|>": 151656,
  "<|vision_end|>": 151653,
  "<|vision_pad|>": 151654,
  "<|vision_start|>": 151652
 }
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,184 @@
 {% macro render_extra_keys(json_dict, handled_keys) %}
 {%- if json_dict is mapping %}
 {%- for json_key in json_dict if json_key not in handled_keys %}
 {%- if json_dict[json_key] is mapping %}
 {{- '
 <' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
 {%- else %}
 {{-'
 <' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
 {%- endif %}
 {%- endfor %}
 {%- endif %}
 {% endmacro %}
 {%- if messages[0]["role"] == "system" %}
 {%- set system_message = messages[0]["content"] %}
 {%- set loop_messages = messages[1:] %}
 {%- else %}
 {%- set loop_messages = messages %}
 {%- endif %}
 {%- if not tools is defined %}
 {%- set tools = [] %}
 {%- endif %}
 {%- if system_message is defined %}
 {{- "<|im_start|>system
 " + system_message }}
 {%- else %}
 {%- if tools is iterable and tools | length > 0 %}
 {{- "<|im_start|>system
 You are Qwen, a helpful AI assistant that can interact with a computer to solve tasks." }}
 {%- endif %}
 {%- endif %}
 {%- if tools is iterable and tools | length > 0 %}
 {{- "
 You have access to the following functions:
 " }}
 {{- "<tools>" }}
 {%- for tool in tools %}
 {%- if tool.function is defined %}
 {%- set tool = tool.function %}
 {%- endif %}
 {{- "
 <function>
 <name>" ~ tool.name ~ "</name>" }}
 {%- if tool.description is defined %}
 {{- '
 <description>' ~ (tool.description | trim) ~ '</description>' }}
 {%- endif %}
 {{- '
 <parameters>' }}
 {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
 {%- for param_name, param_fields in tool.parameters.properties|items %}
 {{- '
 <parameter>' }}
 {{- '
 <name>' ~ param_name ~ '</name>' }}
 {%- if param_fields.type is defined %}
 {{- '
 <type>' ~ (param_fields.type | string) ~ '</type>' }}
 {%- endif %}
 {%- if param_fields.description is defined %}
 {{- '
 <description>' ~ (param_fields.description | trim) ~ '</description>' }}
 {%- endif %}
 {%- set handled_keys = ['name', 'type', 'description'] %}
 {{- render_extra_keys(param_fields, handled_keys) }}
 {{- '
 </parameter>' }}
 {%- endfor %}
 {%- endif %}
 {% set handled_keys = ['type', 'properties'] %}
 {{- render_extra_keys(tool.parameters, handled_keys) }}
 {{- '
 </parameters>' }}
 {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
 {{- render_extra_keys(tool, handled_keys) }}
 {{- '
 </function>' }}
 {%- endfor %}
 {{- "
 </tools>" }}
 {{- '
 If you choose to call a function ONLY reply in the following format with NO suffix:
 <tool_call>
 <function=example_function_name>
 <parameter=example_parameter_1>
 value_1
 </parameter>
 <parameter=example_parameter_2>
 This is the value for the second parameter
 that can span
 multiple lines
 </parameter>
 </function>
 </tool_call>
 <IMPORTANT>
 Reminder:
 - Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags
 - Required parameters MUST be specified
 - You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after
 - If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls
 </IMPORTANT>' }}
 {%- endif %}
 {%- if system_message is defined %}
 {{- '<|im_end|>
 ' }}
 {%- else %}
 {%- if tools is iterable and tools | length > 0 %}
 {{- '<|im_end|>
 ' }}
 {%- endif %}
 {%- endif %}
 {%- for message in loop_messages %}
 {%- if message.role == "assistant" and message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
 {{- '<|im_start|>' + message.role }}
 {%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
 {{- '
 ' + message.content | trim + '
 ' }}
 {%- endif %}
 {%- for tool_call in message.tool_calls %}
 {%- if tool_call.function is defined %}
 {%- set tool_call = tool_call.function %}
 {%- endif %}
 {{- '
 <tool_call>
 <function=' + tool_call.name + '>
 ' }}
 {%- if tool_call.arguments is defined %}
 {%- for args_name, args_value in tool_call.arguments|items %}
 {{- '<parameter=' + args_name + '>
 ' }}
 {%- set args_value = args_value | tojson | safe if args_value is mapping else args_value | string %}
 {{- args_value }}
 {{- '
 </parameter>
 ' }}
 {%- endfor %}
 {%- endif %}
 {{- '</function>
 </tool_call>' }}
 {%- endfor %}
 {{- '<|im_end|>
 ' }}
 {%- elif message.role == "user" or message.role == "system" or message.role == "assistant" %}
 {{- '<|im_start|>' + message.role + '
 ' + message.content + '<|im_end|>' + '
 ' }}
 {%- elif message.role == "tool" %}
 {%- if loop.previtem and loop.previtem.role != "tool" %}
 {{- '<|im_start|>user
 ' }}
 {%- endif %}
 {{- '<tool_response>
 ' }}
 {{- message.content }}
 {{- '
 </tool_response>
 ' }}
 {%- if not loop.last and loop.nextitem.role != "tool" %}
 {{- '<|im_end|>
 ' }}
 {%- elif loop.last %}
 {{- '<|im_end|>
 ' }}
 {%- endif %}
 {%- else %}
 {{- '<|im_start|>' + message.role + '
 ' + message.content + '<|im_end|>
 ' }}
 {%- endif %}
 {%- endfor %}
 {%- if add_generation_prompt %}
 {{- '<|im_start|>assistant
 ' }}
 {%- endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,68 @@
 {
  "architectures": [
    "Qwen3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 12288,
  "layer_types": [
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention"
  ],
  "max_position_embeddings": 32768,
  "max_window_layers": 36,
  "model_type": "qwen3",
  "num_attention_heads": 32,
  "num_hidden_layers": 36,
  "num_key_value_heads": 8,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.55.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "max_new_tokens": 2048,
  "transformers_version": "4.55.0"
 }
--- a/merges.txt
+++ b/merges.txt
--- a/model-00001-of-00007.safetensors
+++ b/model-00001-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:b27b36267c39d1ef6ea910fca525f5bec64b98c95015a094766357f41b800e24
 size 4972454376
--- a/model-00002-of-00007.safetensors
+++ b/model-00002-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:735e564316c7f4f2f214c8203c74e37708377db33adfdd4beca7d1f7193f564c
 size 4832048608
--- a/model-00003-of-00007.safetensors
+++ b/model-00003-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:d7ff4012ce2a19486eabc31344eacf8938d3e07e7c484466bc4964ba047228b0
 size 4832048656
--- a/model-00004-of-00007.safetensors
+++ b/model-00004-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:07df84cbb0d64de9c5ee4bd0d09b1ee1418f638f539b6a2f329965a593894f26
 size 4999855528
--- a/model-00005-of-00007.safetensors
+++ b/model-00005-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:8cee0b3ea997b52de8d017af303a4783e02b3c97406dd6579fdd3084151cc886
 size 4832048672
--- a/model-00006-of-00007.safetensors
+++ b/model-00006-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:c576005c106ff0061e20ffc5b3201f08ba8f63840ec42c151461e50c66e0dd17
 size 4832048672
--- a/model-00007-of-00007.safetensors
+++ b/model-00007-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:ad557cecf8d09347ef4ca8215c4244aef688aab6b967018ec4a0a2a8f8ac9bd5
 size 3462482728
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,407 @@
 {
  "metadata": {
    "total_parameters": 255960480,
    "total_size": 32762941440
  },
  "weight_map": {
    "lm_head.weight": "model-00007-of-00007.safetensors",
    "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.15.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.15.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.15.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.15.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
    "model.layers.15.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.20.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.20.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.21.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.22.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.22.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.22.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
    "model.layers.22.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.22.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.26.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.27.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.28.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.28.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.28.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.28.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.28.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.28.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.28.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.28.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.28.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
    "model.layers.28.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.28.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.29.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.29.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.3.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.30.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.30.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.31.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.32.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.33.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.34.input_layernorm.weight": "model-00007-of-00007.safetensors",
    "model.layers.34.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
    "model.layers.34.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.34.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.34.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
    "model.layers.34.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.34.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.34.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.34.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
    "model.layers.34.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.34.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.35.input_layernorm.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
    "model.layers.35.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
    "model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.9.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
    "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.norm.weight": "model-00007-of-00007.safetensors"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,31 @@
 {
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "eos_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<|endoftext|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,239 @@
 {
  "add_bos_token": false,
  "add_prefix_space": false,
  "added_tokens_decoder": {
    "151643": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151644": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151645": {
      "content": "<|im_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151646": {
      "content": "<|object_ref_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151647": {
      "content": "<|object_ref_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151648": {
      "content": "<|box_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151649": {
      "content": "<|box_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151650": {
      "content": "<|quad_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151651": {
      "content": "<|quad_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151652": {
      "content": "<|vision_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151653": {
      "content": "<|vision_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151654": {
      "content": "<|vision_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151655": {
      "content": "<|image_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151656": {
      "content": "<|video_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151657": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151658": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151659": {
      "content": "<|fim_prefix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151660": {
      "content": "<|fim_middle|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151661": {
      "content": "<|fim_suffix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151662": {
      "content": "<|fim_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151663": {
      "content": "<|repo_name|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151664": {
      "content": "<|file_sep|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151665": {
      "content": "<tool_response>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151666": {
      "content": "</tool_response>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151667": {
      "content": "<think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151668": {
      "content": "</think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    }
  },
  "additional_special_tokens": [
    "<|im_start|>",
    "<|im_end|>",
    "<|object_ref_start|>",
    "<|object_ref_end|>",
    "<|box_start|>",
    "<|box_end|>",
    "<|quad_start|>",
    "<|quad_end|>",
    "<|vision_start|>",
    "<|vision_end|>",
    "<|vision_pad|>",
    "<|image_pad|>",
    "<|video_pad|>"
  ],
  "bos_token": null,
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|endoftext|>",
  "errors": "replace",
  "extra_special_tokens": {},
  "model_max_length": 131072,
  "pad_token": "<|endoftext|>",
  "split_special_tokens": false,
  "tokenizer_class": "Qwen2Tokenizer",
  "unk_token": null
 }
--- a/vocab.json
+++ b/vocab.json