初始化项目，由ModelHub XC社区提供模型

Model: strykes/emberforge-3b-reasoner Source: Original Platform
2026-05-30 19:09:18 +08:00
commit 7c36fbd792
28 changed files with 5552 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,39 @@
 *.7z filter=lfs diff=lfs merge=lfs -text
 *.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.bz2 filter=lfs diff=lfs merge=lfs -text
 *.ckpt filter=lfs diff=lfs merge=lfs -text
 *.ftz filter=lfs diff=lfs merge=lfs -text
 *.gz filter=lfs diff=lfs merge=lfs -text
 *.h5 filter=lfs diff=lfs merge=lfs -text
 *.joblib filter=lfs diff=lfs merge=lfs -text
 *.lfs.* filter=lfs diff=lfs merge=lfs -text
 *.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
 *.msgpack filter=lfs diff=lfs merge=lfs -text
 *.npy filter=lfs diff=lfs merge=lfs -text
 *.npz filter=lfs diff=lfs merge=lfs -text
 *.onnx filter=lfs diff=lfs merge=lfs -text
 *.ot filter=lfs diff=lfs merge=lfs -text
 *.parquet filter=lfs diff=lfs merge=lfs -text
 *.pb filter=lfs diff=lfs merge=lfs -text
 *.pickle filter=lfs diff=lfs merge=lfs -text
 *.pkl filter=lfs diff=lfs merge=lfs -text
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *.rar filter=lfs diff=lfs merge=lfs -text
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
 gguf/Nanbeige4.1-3B-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 gguf/Nanbeige4.1-3B-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
 gguf/Nanbeige4.1-3B-f16.gguf filter=lfs diff=lfs merge=lfs -text
--- a/README.md
+++ b/README.md
@@ -0,0 +1,76 @@
 ---
 language:
 - en
 license: apache-2.0
 tags:
 - transformers
 - safetensors
 - gguf
 - peft
 - qlora
 - reasoning
 base_model:
 - Nanbeige/Nanbeige4.1-3B
 library_name: transformers
 pipeline_tag: text-generation
 ---
 # EmberForge-3B-Reasoner
 Private finetuned Nanbeige4.1-3B reasoning release by `strykes`.
 ## Included Artifacts
 - Merged full model (Safetensors) at repo root for HF benchmarking
 - LoRA adapter in `adapter/`
 - GGUF in `gguf/`:
  - `Nanbeige4.1-3B-Q5_K_M.gguf`
  - `Nanbeige4.1-3B-Q4_K_M.gguf`
  - `Nanbeige4.1-3B-f16.gguf`
 - Optional archive in `archives/`
 ## Training Snapshot
 - Base: `Nanbeige/Nanbeige4.1-3B`
 - Method: Unsloth QLoRA -> merged weights
 - Data: ~3.5k synthetic reasoning samples
 - Epochs: 2
 - Sequence length: 4096
 ## Notes
 - Intended for research and benchmarking.
 - Validate outputs before critical use.
 ## Benchmarks (2026-02-24)
 ### Local lm-eval results (this finetune)
 | Task | Metric | Score |
 |---|---:|---:|
 | mmlu | acc,none | 59.98% |
 | gsm8k | exact_match,flexible-extract | 62.40% |
 | arc_challenge | acc_norm,none | 31.74% |
 | hellaswag | acc_norm,none | 56.07% |
 | winogrande | acc,none | 50.04% |
 | piqa | acc_norm,none | 63.22% |
 | boolq | acc,none | 74.37% |
 | truthfulqa_mc2 | acc,none | 45.34% |
 ### Public references
 - Base model (`Nanbeige/Nanbeige4.1-3B`) author-published benchmarks are listed in:
  - `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md`
 - Frontier references (Claude/GPT/Gemini) are included in the same comparison report.
 ### Reproducibility artifacts
 - `benchmarks/lm-eval-2026-02-24/summary_v3.tsv`
 - `benchmarks/lm-eval-2026-02-24/results_2026-02-24T00-06-21.474293.json`
 - `benchmarks/lm-eval-2026-02-24/run_v3.log`
 - `benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md`
 ### Caveat
 Public model-card comparisons are not always apples-to-apples with lm-evaluation-harness settings (prompting, few-shot, decoding, and benchmark versions can differ).
--- a/adapter/README.md
+++ b/adapter/README.md
@@ -0,0 +1,210 @@
 ---
 base_model: Nanbeige/Nanbeige4.1-3B
 library_name: peft
 pipeline_tag: text-generation
 tags:
 - base_model:adapter:Nanbeige/Nanbeige4.1-3B
 - lora
 - sft
 - transformers
 - trl
 - unsloth
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
 - **Developed by:** [More Information Needed]
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
 - **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
 - **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
 - **Repository:** [More Information Needed]
 - **Paper [optional]:** [More Information Needed]
 - **Demo [optional]:** [More Information Needed]
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 ### Direct Use
 <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 [More Information Needed]
 ### Downstream Use [optional]
 <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 [More Information Needed]
 ### Out-of-Scope Use
 <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
 [More Information Needed]
 ## Bias, Risks, and Limitations
 <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 [More Information Needed]
 ### Recommendations
 <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
 [More Information Needed]
 ## Training Details
 ### Training Data
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 [More Information Needed]
 ### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
 [More Information Needed]
 #### Training Hyperparameters
 - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
 #### Speeds, Sizes, Times [optional]
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 [More Information Needed]
 ## Evaluation
 <!-- This section describes the evaluation protocols and provides the results. -->
 ### Testing Data, Factors & Metrics
 #### Testing Data
 <!-- This should link to a Dataset Card if possible. -->
 [More Information Needed]
 #### Factors
 <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 [More Information Needed]
 #### Metrics
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 [More Information Needed]
 ### Results
 [More Information Needed]
 #### Summary
 ## Model Examination [optional]
 <!-- Relevant interpretability work for the model goes here -->
 [More Information Needed]
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 - **Hardware Type:** [More Information Needed]
 - **Hours used:** [More Information Needed]
 - **Cloud Provider:** [More Information Needed]
 - **Compute Region:** [More Information Needed]
 - **Carbon Emitted:** [More Information Needed]
 ## Technical Specifications [optional]
 ### Model Architecture and Objective
 [More Information Needed]
 ### Compute Infrastructure
 [More Information Needed]
 #### Hardware
 [More Information Needed]
 #### Software
 [More Information Needed]
 ## Citation [optional]
 <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 **BibTeX:**
 [More Information Needed]
 **APA:**
 [More Information Needed]
 ## Glossary [optional]
 <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 [More Information Needed]
 ## More Information [optional]
 [More Information Needed]
 ## Model Card Authors [optional]
 [More Information Needed]
 ## Model Card Contact
 [More Information Needed]
 ### Framework versions
 - PEFT 0.18.1
--- a/adapter/adapter_config.json
+++ b/adapter/adapter_config.json
@@ -0,0 +1,50 @@
 {
  "alora_invocation_tokens": null,
  "alpha_pattern": {},
  "arrow_config": null,
  "auto_mapping": {
    "base_model_class": "LlamaForCausalLM",
    "parent_library": "transformers.models.llama.modeling_llama",
    "unsloth_fixed": true
  },
  "base_model_name_or_path": "Nanbeige/Nanbeige4.1-3B",
  "bias": "none",
  "corda_config": null,
  "ensure_weight_tying": false,
  "eva_config": null,
  "exclude_modules": null,
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layer_replication": null,
  "layers_pattern": null,
  "layers_to_transform": null,
  "loftq_config": {},
  "lora_alpha": 128,
  "lora_bias": false,
  "lora_dropout": 0,
  "megatron_config": null,
  "megatron_core": "megatron.core",
  "modules_to_save": null,
  "peft_type": "LORA",
  "peft_version": "0.18.1",
  "qalora_group_size": 16,
  "r": 64,
  "rank_pattern": {},
  "revision": null,
  "target_modules": [
    "down_proj",
    "up_proj",
    "gate_proj",
    "o_proj",
    "k_proj",
    "v_proj",
    "q_proj"
  ],
  "target_parameters": null,
  "task_type": "CAUSAL_LM",
  "trainable_token_indices": null,
  "use_dora": false,
  "use_qalora": false,
  "use_rslora": false
 }
--- a/adapter/adapter_model.safetensors
+++ b/adapter/adapter_model.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7983f9ec6827018eeffa27618229f4c6a1326ee107c8fbe2c268301afcb47e22
 size 455142376
--- a/adapter/added_tokens.json
+++ b/adapter/added_tokens.json
@@ -0,0 +1,9 @@
 {
  "</think>": 166104,
  "</tool_call>": 166106,
  "<think>": 166103,
  "<tool_call>": 166105,
  "<|endoftext|>": 166102,
  "<|im_end|>": 166101,
  "<|im_start|>": 166100
 }
--- a/adapter/chat_template.jinja
+++ b/adapter/chat_template.jinja
@@ -0,0 +1,137 @@
        {%- if tools %}
            {{- '<|im_start|>system
 ' }}
            {%- if messages[0].role == 'system' %}
                {{- messages[0].content + '
 ' }}
            {%- else %} 
                {{- '你是一位工具函数调用专家，你会得到一个问题和一组可能的工具函数。根据问题，你需要进行一个或多个函数/工具调用以实现目的，请尽量尝试探索通过工具解决问题。
 如果没有一个函数可以使用，请直接使用自然语言回复用户。
 如果给定的问题缺少函数所需的参数，请使用自然语言进行提问，向用户询问必要信息。
 如果调用结果已经足够回答用户问题，请对历史结果进行总结，使用自然语言回复用户。' }} 
            {%- endif %}
            {{- "# Tools
 You may call one or more functions to assist with the user query.
 You are provided with function signatures within <tools></tools> XML tags:
 <tools>" }}
            {%- for tool in tools %}
                {{- "
 " }}
                {{- tool | tojson }}
            {%- endfor %}
            {{- "
 </tools>
 For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
 <tool_call>
 {\"name\": <function-name>, \"arguments\": <args-json-object>}
 </tool_call><|im_end|>
 " }}
        {%- else %}
            {%- if messages[0].role == 'system' %}
                {{- '<|im_start|>system
 ' + messages[0].content + '<|im_end|>
 ' }}
            {%- else %} 
                {{- '<|im_start|>system
 你是南北阁，一款由BOSS直聘自主研发并训练的专业大语言模型。<|im_end|>
 ' }} 
            {%- endif %}
        {%- endif %}
        {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
        {%- for message in messages[::-1] %}
            {%- set index = (messages|length - 1) - loop.index0 %}
            {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
                {%- set ns.multi_step_tool = false %}
                {%- set ns.last_query_index = index %}
            {%- endif %}
        {%- endfor %}
        {%- for message in messages %}
            {%- if message.content is string %}
                {%- set content = message.content %}
            {%- else %}
                {%- set content = '' %}
            {%- endif %}
            {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
                {{- '<|im_start|>' + message.role + '
 ' + content + '<|im_end|>' + '
 ' }}
            {%- elif message.role == "assistant" %}
                {%- set reasoning_content = '' %}
                {%- if message.reasoning_content is string %}
                    {%- set reasoning_content = message.reasoning_content %}
                {%- else %}
                    {%- if '</think>' in content %}
                        {%- set reasoning_content = content.split('</think>')[0].rstrip('
 ').split('<think>')[-1].lstrip('
 ') %}
                        {%- set content = content.split('</think>')[-1].lstrip('
 ') %}
                    {%- endif %}
                {%- endif %}
                {%- if loop.index0 > ns.last_query_index or keep_all_think or (extra_body is defined and extra_body.keep_all_think) %}
                    {%- if loop.last or (not loop.last and reasoning_content) %}
                        {{- '<|im_start|>' + message.role + '
 <think>
 ' + reasoning_content.strip('
 ') + '
 </think>
 ' + content.lstrip('
 ') }}
                    {%- else %}
                        {{- '<|im_start|>' + message.role + '
 ' + content }}
                    {%- endif %}
                {%- else %}
                    {{- '<|im_start|>' + message.role + '
 ' + content }}
                {%- endif %}
                {%- if message.tool_calls %}
                    {%- for tool_call in message.tool_calls %}
                        {%- if (loop.first and content) or (not loop.first) %}
                            {{- '
 ' }}
                        {%- endif %}
                        {%- if tool_call.function %}
                            {%- set tool_call = tool_call.function %}
                        {%- endif %}
                        {{- '<tool_call>
 {"name": "' }}
                        {{- tool_call.name }}
                        {{- '", "arguments": ' }}
                        {%- if tool_call.arguments is string %}
                            {{- tool_call.arguments }}
                        {%- else %}
                            {{- tool_call.arguments | tojson }}
                        {%- endif %}
                        {{- '}
 </tool_call>' }}
                    {%- endfor %}
                {%- endif %}
                {{- '<|im_end|>
 ' }}
            {%- elif message.role == "tool" %}
                {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
                    {{- '<|im_start|>user' }}
                {%- endif %}
                {{- '
 <tool_response>
 ' }}
                {{- content }}
                {{- '
 </tool_response>' }}
                {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
                    {{- '<|im_end|>
 ' }}
                {%- endif %}
            {%- endif %}
        {%- endfor %}
        {%- if add_generation_prompt %}
            {{- '<|im_start|>assistant
 ' }}
        {%- endif %}
--- a/adapter/special_tokens_map.json
+++ b/adapter/special_tokens_map.json
@@ -0,0 +1,33 @@
 {
  "additional_special_tokens": [
    "<|endoftext|>"
  ],
  "bos_token": {
    "content": "<|im_start|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "<|im_end|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  },
  "unk_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/adapter/tokenizer.model
+++ b/adapter/tokenizer.model
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:fb41d04798b714520a9b075727b0226538b7330254299062742c50ec8374bc36
 size 2782298
--- a/adapter/tokenizer_config.json
+++ b/adapter/tokenizer_config.json
@@ -0,0 +1,103 @@
 {
  "add_bos_token": true,
  "add_eos_token": false,
  "add_prefix_space": true,
  "added_tokens_decoder": {
    "0": {
      "content": "<unk>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "1": {
      "content": "<s>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "2": {
      "content": "</s>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "166100": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "166101": {
      "content": "<|im_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "166102": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "166103": {
      "content": "<think>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "166104": {
      "content": "</think>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "166105": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "166106": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": false
    }
  },
  "additional_special_tokens": [
    "<|endoftext|>"
  ],
  "bos_token": "<|im_start|>",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "extra_special_tokens": {},
  "legacy": false,
  "model_max_length": 262144,
  "pad_token": "<unk>",
  "padding_side": "left",
  "sp_model_kwargs": {},
  "spaces_between_special_tokens": false,
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": "<unk>",
  "use_default_system_prompt": false
 }
--- a/added_tokens.json
+++ b/added_tokens.json
@@ -0,0 +1,9 @@
 {
  "</think>": 166104,
  "</tool_call>": 166106,
  "<think>": 166103,
  "<tool_call>": 166105,
  "<|endoftext|>": 166102,
  "<|im_end|>": 166101,
  "<|im_start|>": 166100
 }
--- a/benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md
+++ b/benchmarks/lm-eval-2026-02-24/benchmark_comparison_public_2026-02-24.md
@@ -0,0 +1,70 @@
 # Emberforge 3B Benchmark Comparison (Public + Local)
 Generated: 2026-02-24
 ## 1) Your Finetuned Model (local lm-eval run)
 Model: `strykes/emberforge-3b-reasoner`
 | Task | Metric | Score |
 |---|---:|---:|
 | mmlu | acc,none | 59.98% |
 | gsm8k | exact_match,flexible-extract | 62.40% |
 | arc_challenge | acc_norm,none | 31.74% |
 | hellaswag | acc_norm,none | 56.07% |
 | winogrande | acc,none | 50.04% |
 | piqa | acc_norm,none | 63.22% |
 | boolq | acc,none | 74.37% |
 | truthfulqa_mc2 | acc,none | 45.34% |
 ## 2) Public Base Model (Nanbeige4.1-3B)
 Model: `Nanbeige/Nanbeige4.1-3B` (author-reported benchmarks)
 | Benchmark | Published Score |
 |---|---:|
 | Live-Code-Bench-V6 | 76.90% |
 | AIME 2026 I | 87.40% |
 | HMMT Nov | 77.92% |
 | GPQA | 83.80% |
 | HLE (Text-only) | 12.60% |
 | Arena-Hard-v2 | 73.20% |
 | BFCL-V4 | 56.50% |
 | Tau2-Bench | 48.57% |
 Note: Nanbeige published benchmarks do not overlap directly with your lm-eval task set (`mmlu`, `gsm8k`, `arc_challenge`, etc.), so no exact apples-to-apples delta can be computed without rerunning identical tasks.
 ## 3) Public Frontier Reference (Claude / GPT / Gemini) on overlapping classic tasks
 Source benchmark table: Anthropic Claude 3 model card (March 2024).
 | Benchmark | Your model | Claude 3 Opus | Claude 3 Sonnet | GPT-4 | Gemini 1.0 Ultra | Gemini 1.5 Pro |
 |---|---:|---:|---:|---:|---:|---:|
 | MMLU (5-shot) | 59.98% | 86.80% | 79.00% | 86.40% | 83.70% | 81.90% |
 | GSM8K | 62.40% | 95.00% | 92.30% | 92.00% | 94.40% | 91.70% |
 | ARC-Challenge (25-shot) | 31.74% | 96.40% | 93.20% | 96.30% | — | — |
 | HellaSwag (10-shot) | 56.07% | 95.40% | 89.00% | 95.30% | 87.80% | 92.50% |
 | WinoGrande (5-shot) | 50.04% | 88.50% | 75.10% | 87.50% | — | — |
 ## 4) Latest Frontier Snapshot (2025-2026, non-overlapping tasks)
 Source benchmark table: Claude Opus 4.5 system card, Table 2.3.A.
 | Benchmark | Claude Opus 4.5 | Claude Sonnet 4.5 | Claude Opus 4.1 | Gemini 3 Pro | GPT-5.1 |
 |---|---:|---:|---:|---:|---:|
 | SWE-bench Verified | 80.9% | 77.2% | 74.5% | 76.2% | 76.3% |
 | Terminal-bench 2.0 | 59.3% | 50.0% | 46.5% | 54.2% | 47.6% |
 | ARC-AGI-2 (Verified) | 37.6% | 13.6% | — | 31.1% | 17.6% |
 | GPQA Diamond | 87.0% | 83.4% | 81.0% | 91.9% | 88.1% |
 | MMMU (validation) | 80.7% | 77.8% | 77.1% | — | 85.4% |
 | MMMLU | 90.8% | 89.1% | 89.5% | 91.8% | 91.0% |
 Note: These are newer references but still not directly comparable to your current lm-eval task set.
 ## 5) Caveats
 - Your run uses `lm-evaluation-harness` with specific settings; public model-card numbers may use different prompts, few-shot counts, decoding, or evaluation code.
 - Frontier references in Section 3 are older than current 2026 generations but are official primary-source numbers on overlapping classic benchmarks.
 - Frontier references in Section 4 are current (2025-2026) but mostly on different benchmarks.
 ## Sources
 - Local run artifact: `/workspace/evals/main_results_v3.json/strykes__emberforge-3b-reasoner/results_2026-02-24T00-06-21.474293.json`
 - Nanbeige model card: https://huggingface.co/Nanbeige/Nanbeige4.1-3B
 - Anthropic Claude 3 model card (benchmarks table): https://www-cdn.anthropic.com/c6a80a657af445f40e31afac050f3bf76d3b1404.pdf
 - Anthropic model cards index: https://www.anthropic.com/system-cards
 - Anthropic Claude Opus 4.5 system card: https://www-cdn.anthropic.com/bf10f64990cfda0ba858290be7b8cc6317685f47.pdf
--- a/benchmarks/lm-eval-2026-02-24/results_2026-02-24T00-06-21.474293.json
+++ b/benchmarks/lm-eval-2026-02-24/results_2026-02-24T00-06-21.474293.json
--- a/benchmarks/lm-eval-2026-02-24/run_v3.log
+++ b/benchmarks/lm-eval-2026-02-24/run_v3.log
--- a/benchmarks/lm-eval-2026-02-24/summary_v3.tsv
+++ b/benchmarks/lm-eval-2026-02-24/summary_v3.tsv
@@ -0,0 +1,70 @@
 task	metric	value
 arc_challenge	acc_norm,none	0.3174061433447099
 boolq	acc,none	0.7437308868501529
 gsm8k	exact_match,flexible-extract	0.6239575435936315
 hellaswag	acc_norm,none	0.560744871539534
 mmlu	acc,none	0.5997721122347244
 mmlu_abstract_algebra	acc,none	0.43
 mmlu_anatomy	acc,none	0.6074074074074074
 mmlu_astronomy	acc,none	0.6973684210526315
 mmlu_business_ethics	acc,none	0.62
 mmlu_clinical_knowledge	acc,none	0.6415094339622641
 mmlu_college_biology	acc,none	0.8263888888888888
 mmlu_college_chemistry	acc,none	0.53
 mmlu_college_computer_science	acc,none	0.54
 mmlu_college_mathematics	acc,none	0.5
 mmlu_college_medicine	acc,none	0.5953757225433526
 mmlu_college_physics	acc,none	0.5
 mmlu_computer_security	acc,none	0.68
 mmlu_conceptual_physics	acc,none	0.5872340425531914
 mmlu_econometrics	acc,none	0.35964912280701755
 mmlu_electrical_engineering	acc,none	0.6413793103448275
 mmlu_elementary_mathematics	acc,none	0.5317460317460317
 mmlu_formal_logic	acc,none	0.5
 mmlu_global_facts	acc,none	0.33
 mmlu_high_school_biology	acc,none	0.7548387096774194
 mmlu_high_school_chemistry	acc,none	0.6009852216748769
 mmlu_high_school_computer_science	acc,none	0.69
 mmlu_high_school_european_history	acc,none	0.7696969696969697
 mmlu_high_school_geography	acc,none	0.7272727272727273
 mmlu_high_school_government_and_politics	acc,none	0.7461139896373057
 mmlu_high_school_macroeconomics	acc,none	0.6435897435897436
 mmlu_high_school_mathematics	acc,none	0.45555555555555555
 mmlu_high_school_microeconomics	acc,none	0.7773109243697479
 mmlu_high_school_physics	acc,none	0.5165562913907285
 mmlu_high_school_psychology	acc,none	0.8
 mmlu_high_school_statistics	acc,none	0.5694444444444444
 mmlu_high_school_us_history	acc,none	0.7156862745098039
 mmlu_high_school_world_history	acc,none	0.7974683544303798
 mmlu_human_aging	acc,none	0.600896860986547
 mmlu_human_sexuality	acc,none	0.6946564885496184
 mmlu_humanities	acc,none	0.5300743889479277
 mmlu_international_law	acc,none	0.7851239669421488
 mmlu_jurisprudence	acc,none	0.7222222222222222
 mmlu_logical_fallacies	acc,none	0.6932515337423313
 mmlu_machine_learning	acc,none	0.42857142857142855
 mmlu_management	acc,none	0.6893203883495146
 mmlu_marketing	acc,none	0.8034188034188035
 mmlu_medical_genetics	acc,none	0.69
 mmlu_miscellaneous	acc,none	0.6717752234993615
 mmlu_moral_disputes	acc,none	0.5953757225433526
 mmlu_moral_scenarios	acc,none	0.2446927374301676
 mmlu_nutrition	acc,none	0.6764705882352942
 mmlu_other	acc,none	0.6269713550048278
 mmlu_philosophy	acc,none	0.6559485530546624
 mmlu_prehistory	acc,none	0.6265432098765432
 mmlu_professional_accounting	acc,none	0.4397163120567376
 mmlu_professional_law	acc,none	0.4745762711864407
 mmlu_professional_medicine	acc,none	0.6838235294117647
 mmlu_professional_psychology	acc,none	0.5915032679738562
 mmlu_public_relations	acc,none	0.6
 mmlu_security_studies	acc,none	0.7020408163265306
 mmlu_social_sciences	acc,none	0.6906077348066298
 mmlu_sociology	acc,none	0.7711442786069652
 mmlu_stem	acc,none	0.5883285759594037
 mmlu_us_foreign_policy	acc,none	0.78
 mmlu_virology	acc,none	0.45180722891566266
 mmlu_world_religions	acc,none	0.7192982456140351
 piqa	acc_norm,none	0.6322089227421109
 truthfulqa_mc2	acc,none	0.45340473177307805
 winogrande	acc,none	0.500394632991318
--- a/chat_template.jinja
+++ b/chat_template.jinja
@@ -0,0 +1,137 @@
        {%- if tools %}
            {{- '<|im_start|>system
 ' }}
            {%- if messages[0].role == 'system' %}
                {{- messages[0].content + '
 ' }}
            {%- else %} 
                {{- '你是一位工具函数调用专家，你会得到一个问题和一组可能的工具函数。根据问题，你需要进行一个或多个函数/工具调用以实现目的，请尽量尝试探索通过工具解决问题。
 如果没有一个函数可以使用，请直接使用自然语言回复用户。
 如果给定的问题缺少函数所需的参数，请使用自然语言进行提问，向用户询问必要信息。
 如果调用结果已经足够回答用户问题，请对历史结果进行总结，使用自然语言回复用户。' }} 
            {%- endif %}
            {{- "# Tools
 You may call one or more functions to assist with the user query.
 You are provided with function signatures within <tools></tools> XML tags:
 <tools>" }}
            {%- for tool in tools %}
                {{- "
 " }}
                {{- tool | tojson }}
            {%- endfor %}
            {{- "
 </tools>
 For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
 <tool_call>
 {\"name\": <function-name>, \"arguments\": <args-json-object>}
 </tool_call><|im_end|>
 " }}
        {%- else %}
            {%- if messages[0].role == 'system' %}
                {{- '<|im_start|>system
 ' + messages[0].content + '<|im_end|>
 ' }}
            {%- else %} 
                {{- '<|im_start|>system
 你是南北阁，一款由BOSS直聘自主研发并训练的专业大语言模型。<|im_end|>
 ' }} 
            {%- endif %}
        {%- endif %}
        {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
        {%- for message in messages[::-1] %}
            {%- set index = (messages|length - 1) - loop.index0 %}
            {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
                {%- set ns.multi_step_tool = false %}
                {%- set ns.last_query_index = index %}
            {%- endif %}
        {%- endfor %}
        {%- for message in messages %}
            {%- if message.content is string %}
                {%- set content = message.content %}
            {%- else %}
                {%- set content = '' %}
            {%- endif %}
            {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
                {{- '<|im_start|>' + message.role + '
 ' + content + '<|im_end|>' + '
 ' }}
            {%- elif message.role == "assistant" %}
                {%- set reasoning_content = '' %}
                {%- if message.reasoning_content is string %}
                    {%- set reasoning_content = message.reasoning_content %}
                {%- else %}
                    {%- if '</think>' in content %}
                        {%- set reasoning_content = content.split('</think>')[0].rstrip('
 ').split('<think>')[-1].lstrip('
 ') %}
                        {%- set content = content.split('</think>')[-1].lstrip('
 ') %}
                    {%- endif %}
                {%- endif %}
                {%- if loop.index0 > ns.last_query_index or keep_all_think or (extra_body is defined and extra_body.keep_all_think) %}
                    {%- if loop.last or (not loop.last and reasoning_content) %}
                        {{- '<|im_start|>' + message.role + '
 <think>
 ' + reasoning_content.strip('
 ') + '
 </think>
 ' + content.lstrip('
 ') }}
                    {%- else %}
                        {{- '<|im_start|>' + message.role + '
 ' + content }}
                    {%- endif %}
                {%- else %}
                    {{- '<|im_start|>' + message.role + '
 ' + content }}
                {%- endif %}
                {%- if message.tool_calls %}
                    {%- for tool_call in message.tool_calls %}
                        {%- if (loop.first and content) or (not loop.first) %}
                            {{- '
 ' }}
                        {%- endif %}
                        {%- if tool_call.function %}
                            {%- set tool_call = tool_call.function %}
                        {%- endif %}
                        {{- '<tool_call>
 {"name": "' }}
                        {{- tool_call.name }}
                        {{- '", "arguments": ' }}
                        {%- if tool_call.arguments is string %}
                            {{- tool_call.arguments }}
                        {%- else %}
                            {{- tool_call.arguments | tojson }}
                        {%- endif %}
                        {{- '}
 </tool_call>' }}
                    {%- endfor %}
                {%- endif %}
                {{- '<|im_end|>
 ' }}
            {%- elif message.role == "tool" %}
                {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
                    {{- '<|im_start|>user' }}
                {%- endif %}
                {{- '
 <tool_response>
 ' }}
                {{- content }}
                {{- '
 </tool_response>' }}
                {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
                    {{- '<|im_end|>
 ' }}
                {%- endif %}
            {%- endif %}
        {%- endfor %}
        {%- if add_generation_prompt %}
            {{- '<|im_start|>assistant
 ' }}
        {%- endif %}
--- a/config.json
+++ b/config.json
@@ -0,0 +1,32 @@
 {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 166100,
  "dtype": "float16",
  "embd_pdrop": 0.0,
  "eos_token_id": 166101,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 2560,
  "initializer_range": 0.02,
  "intermediate_size": 10496,
  "max_position_embeddings": 262144,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 20,
  "num_hidden_layers": 32,
  "num_key_value_heads": 4,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "resid_pdrop": 0.0,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 70000000,
  "tie_word_embeddings": false,
  "transformers_version": "4.57.6",
  "use_cache": true,
  "vocab_size": 166144
 }
--- a/generation_config.json
+++ b/generation_config.json
@@ -0,0 +1,7 @@
 {
  "_from_model_config": true,
  "bos_token_id": 166100,
  "eos_token_id": 166101,
  "pad_token_id": 0,
  "transformers_version": "4.57.6"
 }
--- a/gguf/Nanbeige4.1-3B-Q4_K_M.gguf
+++ b/gguf/Nanbeige4.1-3B-Q4_K_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:4a5a2f9028a7ff9959b5cc08fc01228ff67b9c7d0ddaa41c086acd3c43e4210b
 size 2443112064
--- a/gguf/Nanbeige4.1-3B-Q5_K_M.gguf
+++ b/gguf/Nanbeige4.1-3B-Q5_K_M.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:171f542b60aac86574aec155af15d036e4ca4d8c44f74d42eab770d17af19339
 size 2825268864
--- a/gguf/Nanbeige4.1-3B-f16.gguf
+++ b/gguf/Nanbeige4.1-3B-f16.gguf
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:113fea20515ed173bda89873e8dc81a24839872c5ad4d06cbbb477afabe24006
 size 7871576704
--- a/model-00001-of-00002.safetensors
+++ b/model-00001-of-00002.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:7ac64308cdbf331f061103bf29939acb3d8718f238f75903706de5ddae9fd16b
 size 4982284224
--- a/model-00002-of-00002.safetensors
+++ b/model-00002-of-00002.safetensors
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:25ad3c5f1e8f149f0cf17555f2850072f0bbef27e4554f7cf4d26fc7931f3673
 size 2885023544
--- a/model.safetensors.index.json
+++ b/model.safetensors.index.json
@@ -0,0 +1,299 @@
 {
  "metadata": {
    "total_parameters": 3933637120,
    "total_size": 7867274240
  },
  "weight_map": {
    "lm_head.weight": "model-00002-of-00002.safetensors",
    "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
    "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
    "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
    "model.norm.weight": "model-00002-of-00002.safetensors"
  }
 }
--- a/special_tokens_map.json
+++ b/special_tokens_map.json
@@ -0,0 +1,33 @@
 {
  "additional_special_tokens": [
    "<|endoftext|>"
  ],
  "bos_token": {
    "content": "<|im_start|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "<|im_end|>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "pad_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  },
  "unk_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  }
 }
--- a/tokenizer.json
+++ b/tokenizer.json
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:1d8f0326910136aca20831249220b38ce5299527647bc8c6b65404485c479740
 size 18451122
--- a/tokenizer.model
+++ b/tokenizer.model
@@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:fb41d04798b714520a9b075727b0226538b7330254299062742c50ec8374bc36
 size 2782298
--- a/tokenizer_config.json
+++ b/tokenizer_config.json
@@ -0,0 +1,102 @@
 {
  "add_bos_token": true,
  "add_eos_token": false,
  "add_prefix_space": true,
  "added_tokens_decoder": {
    "0": {
      "content": "<unk>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "1": {
      "content": "<s>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "2": {
      "content": "</s>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "166100": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "166101": {
      "content": "<|im_end|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "166102": {
      "content": "<|endoftext|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": true
    },
    "166103": {
      "content": "<think>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "166104": {
      "content": "</think>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "166105": {
      "content": "<tool_call>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": false
    },
    "166106": {
      "content": "</tool_call>",
      "lstrip": false,
      "normalized": true,
      "rstrip": false,
      "single_word": false,
      "special": false
    }
  },
  "additional_special_tokens": [
    "<|endoftext|>"
  ],
  "bos_token": "<|im_start|>",
  "clean_up_tokenization_spaces": false,
  "eos_token": "<|im_end|>",
  "extra_special_tokens": {},
  "legacy": true,
  "model_max_length": 1000000000000000019884624838656,
  "pad_token": "<unk>",
  "sp_model_kwargs": {},
  "spaces_between_special_tokens": false,
  "tokenizer_class": "LlamaTokenizer",
  "unk_token": "<unk>",
  "use_default_system_prompt": false
 }