初始化项目，由ModelHub XC社区提供模型

Model: PhysicsWallahAI/Aryabhata-1.0 Source: Original Platform
2026-05-13 15:06:36 +08:00
commit 832ef4fccf
20 changed files with 987 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,49 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bin.* filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zstandard filter=lfs diff=lfs merge=lfs -text
 *.tfevents* filter=lfs diff=lfs merge=lfs -text
 *.db* filter=lfs diff=lfs merge=lfs -text
 *.ark* filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*data* filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*.meta filter=lfs diff=lfs merge=lfs -text
 **/*ckpt*.index filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.gguf* filter=lfs diff=lfs merge=lfs -text
 *.ggml filter=lfs diff=lfs merge=lfs -text
 *.llamafile* filter=lfs diff=lfs merge=lfs -text
 *.pt2 filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,229 @@
 ---
 license: cc-by-nc-4.0
 tags:
 - small-language-model
 - jee
 - exam-centric
 - indian-education
 - reinforcement-learning
 - supervised-finetuning
 - model-merging
 - rejection-sampling
 - mathematics
 - ai4education
 - physicswallah
 language:
 - en
 model_name: PhysicsWallah/Aryabhata-1.0
 model_creator: Physics Wallah AI Research
 model_type: Causal decoder-based model
 base_model: Qwen/Qwen2.5-Math-7B
 pipeline_tag: text-generation
 library_name: transformers
 ---
 # Aryabhatta 1.0 : An exam-focused language model for JEE Math
 ![](benchmark.png)
 ## Overview
 **Aryabhata 1.0** is a 7B parameter small language model for mathematics developed by **Physics Wallah AI Research**, optimized for high-stakes Indian competitive exams like **JEE Mains**. Despite its compact size, Aryabhata 1.0 achieves **state-of-the-art performance** on exam-centric reasoning tasks with impressive **token efficiency** and low inference cost.
 > 🚧 *Aryabhata 1.0 is an **experimental release**. We are actively seeking feedback — please contribute in the Discussion tab of this repo.*
 ---
 ## 🧠 Key Features
 - **Architecture**: 7B parameter causal decoder-based model.
 - **Exam-Centric Optimization**: Specifically tuned for JEE-level Mathematics reasoning.
 - **High Accuracy**:
  - **86%** on **JEE Mains January 2025** session.
  - **90.2%** on **JEE Mains April 2025** session.
 - **Token Efficiency**: Operates effectively around a **~2K token window**, compared to ~8K required by other reasoning models.
 - **Compute Efficient**: Trained on a **1x2 NVIDIA H100 GPU** using optimized pipeline.
 ---
 ## 🛠️ Training Details
 - **Training Data**: ~130K problem-solution pairs curated from proprietary Physics Wallah exam datasets.
 - **Training Pipeline**:
  - **Model Merging**
  - **Rejection Sampling**
  - **Supervised Fine-Tuning (SFT)**
  - **Reinforcement Learning with Verifiable Rewards (RLVR)**
 ### 🔀 Model Merging
 We began with model merging (Weighted average) to build a strong initialization (Aryabhata 0.5) by combining diverse model capabilities:
 * Qwen 2.5 Math: A robust math-centric LLM with solid symbolic math foundations.
 * Ace Math: An enhanced version of Qwen 2.5 Math, fine-tuned by NVIDIA for improved accuracy in mathematics benchmarks.
 * DeepSeek R1 Distill Qwen: A long-form reasoning model, fine-tuned on reasoning traces distilled from DeepSeek R1.
 ### 📚 Data Curation + Rejection Sampling
 We extracted ~250K raw questions from Physics Wallah's internal database and applied aggressive filtering and cleaning:
 * Removed: diagram-based, non-English, and option-heavy questions.
 * Kept: questions matching the distribution of JEE Main 2019–2024.
 Final curated dataset: ~130K high-quality questions.
 For each question:
 * Generated 4 CoTs using Aryabhata 0.5.
 * Retained only those leading to correct final answers.
 Resulting Dataset:
 * ~100K questions
 * ~350K high-quality CoTs
 We used this dataset for SFT.
 ### 🎯 Reinforcement Learning with Verifiable Rewards (RLVR)
 We used a custom in-house variant of Group Relative Policy Optimization (GRPO), adapted for math-specific reward functions.
 * Removed KL-divergence penalty
 * Removed clipping
 We used RLVR on the remaining ~30K questions.
 This multi-phase training strategy allows Aryabhata 1.0 to capture **pedagogy-aligned reasoning patterns**, making it highly effective for solving real student queries in mathematics.
 ---
 ## 📊 Performance Highlights
 ### Evaluation Setup
 All evaluations were performed with temperature = 0.0, and we report pass@1 accuracy.
 #### Evaluation Datasets
 We evaluated the model on two sets of official JEE Mains 2025 mathematics papers:
 * January Session: 10 question papers containing 250 questions.
 * April Session: 9 question papers containing 225 questions.
 Each paper includes a mix of:
 * Multiple Choice Questions (MCQs) with one correct option
 * Numeric Answer Type (NAT) questions requiring precise numerical responses
 #### Evaluation Metric
 We used a composite evaluation metric to reflect real-world grading rigor and reduce false positives:
 1. Float Match
  * Compares predicted and target answers within a tolerance (±1e-9)
  * Handles rounding artifacts and small numerical errors robustly
 2. String Match
  * Used for symbolic answers (e.g., fractions, radicals)
  * Uses strict exact match — predictions must match ground truth character-for-character
 3. LLM-as-Judge (GPT-4o-mini)
  * Used for Mathematical equivalence for ambiguous formats
 ### 🔹 Accuracy Comparison Across Models
 ![](accuracy.png)
 > *Aryabhata has the best accuracy on JEE Main Maths, on par with frontier models*
 ### 🔹 Accuracy vs Token Usage
 ![](accuracy-vs-token.png)
 > *Aryabhata is on par with frontier models in terms of accuracy vs token usage*
 ---
 ## 🔧 Intended Use
 **Primary Use Cases**:
 - Competitive exam preparation (JEE Main level mathematics problems)
 - Question answering and doubt-solving systems
 - Educational tutoring and concept explanation
 ## 💡 How to Use
 ### 🧪 Using with 🤗 Transformers
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
 model_id = "PhysicsWallahAI/Aryabhata-1.0"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(model_id)
 # Define stop strings
 stop_strings = ["<|im_end|>", "<|end|>", "<im_start|>", "⁠```python\n", "⁠<|im_start|>", "]}}]}}]"]
 def strip_bad_tokens(s, stop_strings):
    for suffix in stop_strings:
        if s.endswith(suffix):
            return s[:-len(suffix)]
    return s
 # Create generation config (can also set temperature, top_p, etc.)
 generation_config = GenerationConfig(
    max_new_tokens=4096,
    stop_strings = stop_strings
 )
 query = 'Find all the values of \\sqrt[3]{1}'
 messages = [{'role': 'system', 'content': 'Think step-by-step; put only the final answer inside \\boxed{}.'},
            {'role': 'user', 'content': query}]
 text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
 )
 inputs = tokenizer([text], return_tensors="pt")
 outputs = model.generate(**inputs, generation_config=generation_config, tokenizer=tokenizer)
 print(strip_bad_tokens(tokenizer.decode(outputs[0], skip_special_tokens=True), stop_strings))
 ````
 ---
 ### ⚡ Using with vLLM
 To run the model efficiently using vLLM:
 ```python
 from vllm import LLM, SamplingParams
 # Initialize model (downloads from Hugging Face if not local)
 llm = LLM(model="PhysicsWallahAI/Aryabhata-1.0")
 # Define prompt and sampling configuration
 query = 'Find all the values of \\sqrt[3]{1}'
 messages = [{'role': 'system', 'content': 'Think step-by-step; put only the final answer inside \\boxed{}.'},
            {'role': 'user', 'content': query}]
 sampling_params = SamplingParams(temperature=0.0, max_tokens=4*1024, stop=["<|im_end|>", "<|end|>", "<im_start|>", "⁠```python\n", "⁠<|im_start|>", "]}}]}}]"])
 # Run inference
 results = llm.chat(messages, sampling_params)
 # Print result
 print(results[0].outputs[0].text.strip())
 ```
 ---
 Read more about Aryabhata 1.0 in our [Technical Report](https://arxiv.org/abs/2508.08665)
 ---
 ## 🚀 Roadmap
 **Aryabhata 2.0** (Upcoming):
 - Extending domain coverage to **Physics** and **Chemistry**
 - Supporting **JEE Advanced**, **NEET**, and **Foundation syllabus**
 - Further optimization for affordability and accuracy in real-time deployments
 ---
 ## 🤝 Citation
 If you use this model, please cite:
 ```bibtex
@misc{Aryabhata2025,
  title = {Aryabhata 1.0: A compact, exam-focused language model tailored for mathematics in Indian competitive exams, especially JEE Main.},
  author = {Physics Wallah AI Research},
  year = {2025},
  note = {\url{https://huggingface.co/PhysicsWallahAI/Aryabhata-1.0}},
 }
--- a/accuracy-vs-token.png
+++ b/accuracy-vs-token.png
--- a/accuracy.png
+++ b/accuracy.png
--- a/benchmark.png
+++ b/benchmark.png
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,54 @@
 {%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0]['role'] == 'system' %}
        {{- messages[0]['content'] }}
    {%- else %}
        {{- 'Please reason step by step, and put your final answer within \\boxed{}.' }}
    {%- endif %}
    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
 {%- else %}
    {%- if messages[0]['role'] == 'system' %}
        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
    {%- else %}
        {{- '<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n' }}
    {%- endif %}
 {%- endif %}
 {%- for message in messages %}
    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
    {%- elif message.role == "assistant" %}
        {{- '<|im_start|>' + message.role }}
        {%- if message.content %}
            {{- '\n' + message.content }}
        {%- endif %}
        {%- for tool_call in message.tool_calls %}
            {%- if tool_call.function is defined %}
                {%- set tool_call = tool_call.function %}
            {%- endif %}
            {{- '\n<tool_call>\n{"name": "' }}
            {{- tool_call.name }}
            {{- '", "arguments": ' }}
            {{- tool_call.arguments | tojson }}
            {{- '}\n</tool_call>' }}
        {%- endfor %}
        {{- '<|im_end|>\n' }}
    {%- elif message.role == "tool" %}
        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
            {{- '<|im_start|>user' }}
        {%- endif %}
        {{- '\n<tool_response>\n' }}
        {{- message.content }}
        {{- '\n</tool_response>' }}
        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
            {{- '<|im_end|>\n' }}
        {%- endif %}
    {%- endif %}
 {%- endfor %}
 {%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
 {%- endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,60 @@
 {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "layer_types": [
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention",
    "full_attention"
  ],
  "max_position_embeddings": 131072,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "pad_token_id": 151643,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 10000,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.53.1",
  "use_cache": false,
  "use_mrope": false,
  "use_sliding_window": false,
  "vocab_size": 152064
 }
--- a/configuration.json
+++ b/configuration.json
@@ -0,0 +1 @@
 {"framework": "pytorch", "task": "text-generation", "allow_remote": true}
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,6 @@
 {
  "_from_model_config": true,
  "bos_token_id": 151643,
  "eos_token_id": 151643,
  "transformers_version": "4.53.1"
 }
--- a/model-00001-of-00007.safetensors
+++ b/model-00001-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:42c9b8d8d78a684e55c7a3903ac12b9754a4eaf428a24ff04ca87dfd1f4cc5dd
 size 4976687216
--- a/model-00002-of-00007.safetensors
+++ b/model-00002-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7ac54690921b7792357bbd023e774fb17bd2d8afccc7d9a86c1ddc0fa7566335
 size 4778622352
--- a/model-00003-of-00007.safetensors
+++ b/model-00003-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:85fae0e42c985df72ba9e49a7784907293f363099a8a04daa0d4b6853ae83d43
 size 4932743960
--- a/model-00004-of-00007.safetensors
+++ b/model-00004-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:0e247bccc7532c103d5c4b53c9f115da434bd1474f0a976fe1f06a3d80e108f7
 size 4932743992
--- a/model-00005-of-00007.safetensors
+++ b/model-00005-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:f31dc4111e3269dd384211fe90080219f02ef38066628e0bd227599e911cea25
 size 4998852296
--- a/model-00006-of-00007.safetensors
+++ b/model-00006-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4d4d9b994e1bd366d69ecb06b70733a2bc1750ff7c4b9d2478a4de440426c6ec
 size 3662865184
--- a/model-00007-of-00007.safetensors
+++ b/model-00007-of-00007.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:dc93f2567b1519384e6a52bea5efb48670ffb0326ae01c6ac445201ed198a638
 size 2179989632
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,347 @@
 {
  "metadata": {
    "total_parameters": 7615616512,
    "total_size": 30462466048
  },
  "weight_map": {
    "lm_head.weight": "model-00007-of-00007.safetensors",
    "model.embed_tokens.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.input_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.input_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.10.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.12.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.13.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.13.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.13.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.14.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.14.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.14.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.14.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.14.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.14.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.14.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.14.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.15.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.17.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.18.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.18.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.18.self_attn.k_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.q_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.v_proj.bias": "model-00004-of-00007.safetensors",
    "model.layers.18.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
    "model.layers.19.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.19.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.19.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.19.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.19.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.19.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.19.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.19.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.19.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.19.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.19.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.19.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.2.input_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00007.safetensors",
    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
    "model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.20.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.20.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.20.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.20.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.20.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.20.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.20.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.21.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.21.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.21.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.21.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.21.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.21.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.21.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.21.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.21.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.21.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.21.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.21.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.22.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.22.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.22.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.22.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.input_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.23.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.24.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.24.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.24.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.24.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.24.self_attn.k_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.24.self_attn.q_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.v_proj.bias": "model-00005-of-00007.safetensors",
    "model.layers.24.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
    "model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.25.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.25.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.25.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
    "model.layers.25.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.25.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.25.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
    "model.layers.25.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.25.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
    "model.layers.25.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.26.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
    "model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.26.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
    "model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.26.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
    "model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
    "model.layers.27.self_attn.k_proj.bias": "model-00006-of-00007.safetensors",
    "model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.27.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.27.self_attn.q_proj.bias": "model-00006-of-00007.safetensors",
    "model.layers.27.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.27.self_attn.v_proj.bias": "model-00006-of-00007.safetensors",
    "model.layers.27.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
    "model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.3.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.3.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.4.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.6.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.input_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.7.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.8.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.8.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.8.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.8.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.q_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00007.safetensors",
    "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
    "model.layers.9.input_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.9.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.self_attn.q_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.9.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
    "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00007.safetensors",
    "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
    "model.norm.weight": "model-00006-of-00007.safetensors"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,23 @@
 {
  "bos_token": {
    "content": "<｜begin▁of▁sentence｜>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "<｜end▁of▁sentence｜>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<｜end▁of▁sentence｜>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:e20ddafc659ba90242154b55275402edeca0715e5dbb30f56815a4ce081f4893
 size 11422778
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,194 @@
 {
  "add_bos_token": true,
  "add_eos_token": false,
  "add_prefix_space": null,
  "added_tokens_decoder": {
    "151643": {
      "content": "<｜end▁of▁sentence｜>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151644": {
      "content": "<｜User｜>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151645": {
      "content": "<｜Assistant｜>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151646": {
      "content": "<｜begin▁of▁sentence｜>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151647": {
      "content": "<|EOT|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151648": {
      "content": "<think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151649": {
      "content": "</think>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151650": {
      "content": "<|quad_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151651": {
      "content": "<|quad_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151652": {
      "content": "<|vision_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151653": {
      "content": "<|vision_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151654": {
      "content": "<|vision_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151655": {
      "content": "<|image_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151656": {
      "content": "<|video_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "151657": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151658": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151659": {
      "content": "<|fim_prefix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151660": {
      "content": "<|fim_middle|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151661": {
      "content": "<|fim_suffix|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151662": {
      "content": "<|fim_pad|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151663": {
      "content": "<|repo_name|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "151664": {
      "content": "<|file_sep|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false
    }
  },
  "bos_token": "<｜begin▁of▁sentence｜>",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<｜end▁of▁sentence｜>",
  "extra_special_tokens": {},
  "legacy": true,
  "model_max_length": 16384,
  "pad_token": "<｜end▁of▁sentence｜>",
  "sp_model_kwargs": {},
  "tokenizer_class": "LlamaTokenizerFast",
  "unk_token": null,
  "use_default_system_prompt": false
 }
		`@@ -0,0 +1 @@`
							`{"framework": "pytorch", "task": "text-generation", "allow_remote": true}`